Yeti monitoring with RIPE Atlas

1. Purpose of Yeti Monitorint Sytsem

Currently Yeti testbed lacks sufficient information to reflect the status of Y
-eti operation, in aspect of availability, consistency and performance. It
also lacks tools to measure the result of experiments of like introducing more
root server, KSK rollover, Multi-ZKSs etc. Thus, there is a need of monitoring
the functionality, availability and consistence of Yeti Distribution Master as
well as Yeti root server.

The basic idea is setting regualarly monitoring task in Atlas to query SOA
record of fifteen root servers through both UDP and TCP to check the consi
-stence of SOA record. Use Nagios plugin to periodically get the result of
Atlas monitoring task and parse it to trigger alert. Alert email is sent when
there is an exception.

2. check_yeti, the Nagios plugin

2.1 the design of plugin

chekc_yeti gets test results by Atlas API and analysis it.
then output status results and display it by Nagios Web interface.

get the result from Atlas, sample code:

stop_time=str(int(time.time()))
start_time=str(int(int(stop_time) - 3600))
base_url="https://atlas.ripe.net/api/v2/measurements/"
url=base_url + targetid + "/results" + "?start=" + start_time + "&" + \
                                       "stop=" + stop_time + "&format=json"
urllib.urlretrieve(url, outfile)

2.2 Checking Algorithm

  • Atlas:
    a.set regularly monitoring task for 15 root server.
    b.each time use 10 probes.
  • Nagios:
    a.get result of each root server in one hour
    b.analyse result
    c.return statuse code

    OK: over six probe returns OK result WARNING: OK number blows 4
    CRITICAL: None of proble returns OK
    UNKNOWN: no return data.

PS: nagios status code

STATE_OK=0  
STATE_WARNING=1  
STATE_CRITICAL=2  
STATE_UNKNOWN=3  

3. Deployments

3.1 check_yeti

Give the permission to execute it and put it into Nagios’s plugin directory.

3.2 hosts.cfg

Define monitoring Server

define host{  
    use                     linux-server-yeti-rootserver  
    host_name               bii.dns-lab.net  
    alias                   bii.dns-lab.net  
    address                 240c:f:1:22::6  
}    

Different servers should be defined seperately

3.3 commands.cfg

Define check_yeti checking commands

define command{
        command_name    check_yeti
        command_line    $USER1$/check_yeti $ARG1$
}

$USER1$ : nagios’s plugin directory $ARG1$: plugin input parameter, ID of monitoring task

3.4 service.cfg

Define check_yeti monitoring service

define service{  
   	     use                             generic-service          
   	     host_name                       bii.dns-lab.net  
   	     service_description             check_yeti  
   	     check_command                   check_yeti!1369633 
   	     }

generic-service: Define nagios monitoring templates such as checking interval(2 hours), alarming interval, alarming level
1369633 : ID of monitoring tast
Different servers should be defined seperately

3.5 contacts.cfg

Define alarming contactors

 define contact{
       contact_name      yeti          
       use               generic-contact               
       email             xxx@biigroup.cn
  }

contact_name : contactor name, will use directly in templeate
email: contactor, seperate by commas

3.6. Start nagios

servic nagios restart

4. Display Atlas monitoring status on website

  1. seting dnsdomainmon tasks in Atlas, get zone ID
  2. using Atlas API, refering https://atlas.ripe.net/dnsmon/
  3. key parameter: zone: “3069263”
  4. sample code
	 
	    <!DOCTYPE html>
	    <html>
	    <head>
	    <title>domainmon test</title>
	    </head>
	
	    <body>
	    <script type="text/javascript" src="https://www.ripe.net/++resource++ripe.plonetheme.javascripts/jquery/1.11.2.js"></script>
	
	    <script type="text/javascript" src="https://www.ripe.net/++resource++ripe.plonetheme.javascripts/bootstrap.min.js">
	    <script type="text/javascript" src="https://www.ripe.net/++resource++ripe.plonetheme.javascripts/template.js"></script>
	    <script type="text/javascript" src="https://www.ripe.net/++resource++ripe.plonetheme.javascripts/browser-update.js"></script>
	    <script type="text/javascript" src="https://www.ripe.net/modernizr.js"></script>
	
	    <script type="text/javascript" src="https://www-static.ripe.net/static/rnd-ui/atlas/static/ui/js/moment.min.js"></script>
	    <script type="text/javascript" src="https://www-static.ripe.net/static/rnd-ui/atlas/static/core/contrib/tablesorter/jquery.tablesorter.min.js"></script>
	    <script type="text/javascript" src="https://www-static.ripe.net/static/rnd-ui/atlas/static/core/js/jquery.form.min.js"></script>
	    <script type="text/javascript" src="/variables.js?v=Archangel"></script>
	
	    <script src="/easteregg.js?v=Archangel"></script>
	    <script>
	    DNSMON_PROBES_DATA_API_URL = 'https://atlas.ripe.net/dnsmon/api/probes';
	    DNSMON_SERVERS_DATA_API_URL = 'https://atlas.ripe.net/dnsmon/api/servers';
	    DNSMON_ATLAS_DATA_API_URL = 'https://atlas.ripe.net/dnsmon/api/atlas-data';
	    DNSMON_ATLAS_TRACEROUTE_API_URL = 'https://atlas.ripe.net/dnsmon/api/atlas-data';
	    DNSMON_ATLAS_NSID_API_URL = 'https://atlas.ripe.net/dnsmon/api/atlas-data';
	    </script>
	
	    <script type="text/javascript" src="https://atlas.ripe.net/dnsmon/dnsmon-widget-main.js" ></script>
	    <div id="domainmon"></div>
	
	    <script>
	            var dnsmon;
	            $(function() {
	                var hasUdp = true;
	                var hasTcp = true;
	                function onDraw(params) {
	                    var tab;
	                    if (params.isTcp) {
	                        tab = $(".protocol-tabs a[data-protocol=tcp");
	                    } else {
	                        tab = $(".protocol-tabs a[data-protocol=udp");
	                    }
	                    $(".protocol-tabs").show();
	                    tab.tab("show");
	                };
	
	                dnsmon = initDNSmon(
	                        '#domainmon',
	                        {
	                            dev: false,
	                            lang: "en",
	                            load: onDraw,
	                            change: onDraw
	                        }, {
	                            type: "zone-servers",
	                            zone: "3069263",
	                            isTcp: hasUdp ? false : true
	                    });
	            });
	    </script>
	    
	    </body>
	    </html>