Nagios Core Administrators Cookbook
上QQ阅读APP看书,第一时间看更新

Creating a new network host

In this recipe, we'll start with the default Nagios Core configuration, and set up a host definition for a server that responds to PING on our local network. The end result will be that Nagios Core will add our new host to its internal tables when it starts up, and will automatically check it (probably using PING) on a regular basis. In this example, I'll use my Nagios Core monitoring server with a Domain Name System (DNS) name of olympus.naginet, and add a host definition for a webserver with a DNS name of sparta.naginet. This is all on my local network – 10.128.0.0/24.

Getting ready

You'll need a working Nagios Core 3.0 or greater installation with a web interface, with all the Nagios Core Plugins installed. If you have not yet installed Nagios Core, then you should start with the QuickStart guide: http://nagios.sourceforge.net/docs/3_0/quickstart.html.

We'll assume that the configuration file that Nagios Core reads on startup is located at /usr/local/nagios/etc/nagios.cfg, as is the case with the default install. It shouldn't matter where you include this new host definition in the configuration, as long as Nagios Core is going to read the file at some point, but it might be a good idea to give each host its own file in a separate objects directory, which we'll do here. You should have access to a shell on the server, and be able to write text files using an editor of your choice; I'll use vi. You will need root privileges on the server via su or sudo.

You should know how to restart Nagios Core on the server, so that the configuration you're going to add gets applied. It shouldn't be necessary to restart the whole server to do this! A common location for the startup/shutdown script on Unix-like hosts is /etc/init.d/nagios, which I'll use here.

You should also get the hostname or IP address of the server you'd like to monitor ready. It's good practice to use the IP address if you can, which will mean your checks keep working even if DNS is unavailable. You shouldn't need the subnet mask or anything like that; Nagios Core will only need whatever information the PING tool would need for its own check_ping command.

Finally, you should test things first; confirm that you're able to reach the host from the Nagios Core server via PING by checking directly from the shell, to make sure your network stack, routes, firewalls, and netmasks are all correct:

tom@olympus:~$ ping 10.128.0.21
PING sparta.naginet (10.128.0.21) 56(84) bytes of data.
64 bytes from sparta.naginet (10.128.0.21): icmp_req=1 ttl=64 time=0.149 ms

How to do it...

We can create the new host definition for sparta.naginet as follows:

  1. Change directory to /usr/local/nagios/etc/objects, and create a new file called sparta.naginet.cfg:
    # cd /usr/local/nagios/etc/objects
    # vi sparta.naginet.cfg
    
  2. Write the following into the file, changing the values in bold as appropriate for your own setup:
    define host {
     host_name sparta.naginet
     alias sparta
     address 10.128.0.21
     max_check_attempts 3
        check_period           24x7
        check_command          check-host-alive
        contacts               nagiosadmin
        notification_interval  60
        notification_period    24x7
    }
  3. Change directory to /usr/local/nagios/etc, and edit the nagios.cfg file:
    # cd ..
    # vi nagios.cfg
    
  4. At the end of the file add the following line:
    cfg_file=/usr/local/nagios/etc/objects/sparta.naginet.cfg
    
  5. Restart the Nagios Core server:
    # /etc/init.d/nagios restart
    

If the server restarted successfully, the web interface should show a brand new host in the Hosts list, in PENDING state as it waits to run a check that the host is alive:

How to do it...

In the next few minutes, it should change to green to show that the check passed and the host is UP, assuming that the check succeeded:

How to do it...

If the test failed and Nagios Core was not able to get a PING response from the target machine after three tries, for whatever reason, then it would probably look similar to the following screenshot:

How to do it...

How it works...

The configuration we included in this section adds a host to Nagios Core's list of hosts. It will periodically check the host by sending a PING request, checking to see if it receives a reply, and updating the host's status as shown in the Nagios Core web interface accordingly. We haven't defined any other services to check for this host yet, nor have we specified what action it should take if the host is down. However, the host itself will be automatically checked at regular intervals by Nagios Core, and we can view its state in the web interface at any time.

The directives we defined in the preceding configuration are explained as follows:

  • host_name: This defines the hostname of the machine, used internally by Nagios Core to refer to its host. It will end up being used in other parts of the configuration.
  • alias: This defines a more recognizable human-readable name for the host; it appears in the web interface. It could also be used for a full-text description of the host.
  • address: This defines the IP address of the machine. This is the actual value that Nagios Core will use for contacting the server; using an IP address rather than a DNS name is generally best practice, so that the checks continue to work even if DNS is not functioning.
  • max_check_attempts: This defines the number of times Nagios Core should try to repeat the check if checks fail. Here, we've defined a value of 3, meaning that Nagios Core will try two more times to PING the target host after first finding it down.
  • check_period: This references the time period that the host should be checked. 24x7 is a time period defined in the default configuration for Nagios Core. This is a sensible value for hosts, as it means the host will always be checked. This defines how often Nagios Core will check the host, and not how often it will notify anyone.
  • check_command: This references the command that will be used to check whether the host is UP, DOWN, or UNREACHABLE. In this case, a QuickStart Nagios Core configuration defines check-host-alive as a PING check, which is a good test of basic network connectivity, and a sensible default for most hosts. This directive is actually not required to make a valid host, but you will want to include it under most circumstances; without it, no checks will be run.
  • contacts: This references the contact or contacts that will be notified about state changes in the host. In this instance, we've used nagiosadmin, which is defined in the QuickStart Nagios Core configuration.
  • notification_interval: This defines how regularly the host should repeat its notifications if it is having problems. Here, we've used a value of 60, which corresponds to 60 minutes or one hour.
  • notification_period: This references the time period during which Nagios Core should send out notifications, if there are problems. Here, we're again using the 24x7 time period; for other hosts, another time period such as workhours might be more appropriate.

Note that we added the definition in its own file called sparta.naginet.cfg , and then referred to it in the main nagios.cfg configuration file. This is simply a conventional way of laying out hosts, and it happens to be quite a tidy way to manage things to keep definitions in their own files.

There's more...

There are a lot of other useful parameters for hosts, but the ones we've used include everything that's required.

While this is a perfectly valid way of specifying a host, it's more typical to define a host based on some template, with definitions of how often the host should be checked, who should be contacted when its state changes and on what basis, and similar properties. Nagios Core's QuickStart sample configuration defines a simple template host called generic-host, which could be used by extending the host definition with the use directive:

define host {
 use generic-host
    name                sparta
    host_name           sparta.naginet
    address             10.128.0.21
    max_check_attempts  3
    contacts            nagiosadmin
}

This uses all the parameters defined for generic-host, and then adds on the details of the specific host that needs to be checked. Note that if you use generic-host, then you will need to define check_command in your host definition. If you're curious to see what's defined in generic-host, then you can find its definition in /usr/local/nagios/etc/objects/templates.cfg.

See also

  • The Using an alternative check command for hosts recipe in Chapter 2, Working with Commands and Plugins
  • The Specifying how frequently to check a host recipe in Chapter 3, Working with Checks and States
  • The Grouping configuration files in directories and Using inheritance to simplify configuration recipes in Chapter 9, Managing Configuration