Using an alternative check command for hosts
In this recipe, we'll learn how to deal with a slightly tricky case in network monitoring—monitoring a server that doesn't respond to PING, but still provides some network service that requires checking.
It's good practice to allow PING where you can, as it's one of the stipulations in RFC 1122 and a very useful diagnostic tool not just for monitoring, but also for troubleshooting. However, sometimes servers that are accessed only by a few people might be configured not to respond to these messages, perhaps for reasons of secrecy. It's quite common for domestic routers to be configured this way.
Another very common reason for this problem, and the example we'll address here, is checking servers that are behind an IPv4 NAT firewall. It's not possible to address the host directly via an RFC1918 address, such as 192.168.1.20
, from the public Internet. Pinging the public interface of the router therefore doesn't tell us whether the host for which it is translating addresses is actually working.
However, port 22
for SSH is forwarded from the outside to this server, and it's this service that we need to check for availability.
We'll do this by checking whether the host is up through an SSH check, since we can't PING it from the outside as we normally would.
Getting ready
You should have a Nagios Core 3.0 or newer server running with a few hosts and services configured already. You should also already be familiar with the relationship between services, commands, and plugins.
How to do it...
We can specify an alternative check method for a host as follows:
- Change to the directory containing the objects configuration for Nagios Core. The default location is
/usr/local/nagios/etc/objects
:# cd /usr/local/nagios/etc/objects
- Find the file that contains the host definition for the host that won't respond to PING, and edit it. In this example, our
crete.naginet
host is the one we want to edit:# vi crete.naginet.cfg
- Change or define the
check_command
parameter of the host to the command that we want to use for the check instead of the usualcheck-host-alive
orcheck_ping
plugin. In this case, we want to usecheck_ssh
. The resulting host definition might look similar to the following code snippet:define host { use linux-server host_name crete.naginet alias crete address 10.128.0.23 check_command check_ssh }
Note that defining
check_command
still works even if we're using a host template, such asgeneric-host
orlinux-server
. It's a good idea to check that the host will actually respond to our check as we expect it to:# sudo -s -u nagios $ /usr/local/nagios/libexec/check_ssh -H 10.128.0.23 SSH OK - OpenSSH_5.5p1 Debian-6+squeeze1 (protocol 2.0)
- Validate the configuration and restart the Nagios Core server:
# /usr/local/nagios/bin/nagios -v /usr/local/nagios/etc/nagios.cfg # /etc/init.d/nagios restart
With this done, the next scheduled host check for the crete.naginet
server should show the host as UP
, because it was checked with the check_ssh
command and not the usual check-host-alive
command.
How it works
The configuration we added for the crete.naginet
host uses check_ssh
to check whether the host is UP
, rather than a check that uses PING. This is appropriate because the only public service accessible from crete.naginet
is its SSH service.
The check_ssh
command is normally used to check whether a service is available, rather than a host. However, Nagios Core allows us to use it as a host check command as well. Most service commands work this way; you could check a web server behind NAT in the same way with check_http
.
There's more...
Note that for completeness' sake, it would also be appropriate to monitor the NAT router via PING, or some other check appropriate to its public address. That way, if the host check for the SSH server fails, we can check to see if the NAT router in front of it is still available, which assists in troubleshooting whether the problem is with the server or with the NAT router in front of it. You can make this setup even more useful by making the NAT router a parent host for the SSH server behind it, explained in the Creating a network host hierarchy recipe in Chapter 8, Understanding the Network Layout.
See also
- The Monitoring SSH for any host and Checking an alternative SSH port recipes in Chapter 5, Monitoring Methods
- The Monitoring local services on a remote machine with NRPE recipe in Chapter 6, Enabling Remote Execution
- The Creating a network host hierarchy and Establishing a host dependency recipes in Chapter 8, Understanding the Network Layout