Nagios Core Administrators Cookbook
上QQ阅读APP看书,第一时间看更新

Using an alternative check command for hosts

In this recipe, we'll learn how to deal with a slightly tricky case in network monitoring—monitoring a server that doesn't respond to PING, but still provides some network service that requires checking.

It's good practice to allow PING where you can, as it's one of the stipulations in RFC 1122 and a very useful diagnostic tool not just for monitoring, but also for troubleshooting. However, sometimes servers that are accessed only by a few people might be configured not to respond to these messages, perhaps for reasons of secrecy. It's quite common for domestic routers to be configured this way.

Another very common reason for this problem, and the example we'll address here, is checking servers that are behind an IPv4 NAT firewall. It's not possible to address the host directly via an RFC1918 address, such as 192.168.1.20, from the public Internet. Pinging the public interface of the router therefore doesn't tell us whether the host for which it is translating addresses is actually working.

However, port 22 for SSH is forwarded from the outside to this server, and it's this service that we need to check for availability.

Using an alternative check command for hosts

We'll do this by checking whether the host is up through an SSH check, since we can't PING it from the outside as we normally would.

Getting ready

You should have a Nagios Core 3.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins.

How to do it...

We can specify an alternative check method for a host as follows:

  1. Change to the directory containing the objects configuration for Nagios Core. The default location is /usr/local/nagios/etc/objects:
    # cd /usr/local/nagios/etc/objects
    
  2. Find the file that contains the host definition for the host that won't respond to PING, and edit it. In this example, our crete.naginet host is the one we want to edit:
    # vi crete.naginet.cfg
    
  3. Change or define the check_command parameter of the host to the command that we want to use for the check instead of the usual check-host-alive or check_ping plugin. In this case, we want to use check_ssh. The resulting host definition might look similar to the following code snippet:
    define host {
        use            linux-server
        host_name      crete.naginet
        alias          crete
        address        10.128.0.23
        check_command  check_ssh
    }

    Note that defining check_command still works even if we're using a host template, such as generic-host or linux-server. It's a good idea to check that the host will actually respond to our check as we expect it to:

    # sudo -s -u nagios
    $ /usr/local/nagios/libexec/check_ssh -H 10.128.0.23
    SSH OK - OpenSSH_5.5p1 Debian-6+squeeze1 (protocol 2.0)
    
  4. Validate the configuration and restart the Nagios Core server:
    # /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg
    # /etc/init.d/nagios restart
    

With this done, the next scheduled host check for the crete.naginet server should show the host as UP, because it was checked with the check_ssh command and not the usual check-host-alive command.

How it works

The configuration we added for the crete.naginet host uses check_ssh to check whether the host is UP, rather than a check that uses PING. This is appropriate because the only public service accessible from crete.naginet is its SSH service.

How it works

The check_ssh command is normally used to check whether a service is available, rather than a host. However, Nagios Core allows us to use it as a host check command as well. Most service commands work this way; you could check a web server behind NAT in the same way with check_http.

There's more...

Note that for completeness' sake, it would also be appropriate to monitor the NAT router via PING, or some other check appropriate to its public address. That way, if the host check for the SSH server fails, we can check to see if the NAT router in front of it is still available, which assists in troubleshooting whether the problem is with the server or with the NAT router in front of it. You can make this setup even more useful by making the NAT router a parent host for the SSH server behind it, explained in the Creating a network host hierarchy recipe in Chapter 8, Understanding the Network Layout.

See also

  • The Monitoring SSH for any host and Checking an alternative SSH port recipes in Chapter 5, Monitoring Methods
  • The Monitoring local services on a remote machine with NRPE recipe in Chapter 6, Enabling Remote Execution
  • The Creating a network host hierarchy and Establishing a host dependency recipes in Chapter 8, Understanding the Network Layout