Nagios Core Administrators Cookbook
上QQ阅读APP看书,第一时间看更新

Introduction

Once hosts and services are configured in Nagios Core, its behavior is primarily dictated by the checks it makes to ensure that hosts and services are operating as expected, and the state it concludes these hosts and services must be in as a result of those checks.

How often it's appropriate to check hosts and services, and on what basis it's appropriate to flag a host or service as having problems, depends very much on the nature of the service and the importance of it running all the time. If a host on the other side of the world is being checked with PING, and during busy periods its round trip time is over 100ms, then this may not actually be a cause for concern at all, and perhaps not something to even flag a WARNING state over, let alone a CRITICAL one.

However, if the same host were on the local network where it would be appropriate to expect round trip times of less than 10ms, then a round trip time of more than 100ms could well be considered a grave cause for concern, signaling a packet storm or other problem with the local network, and we would want to notify the appropriate administrators immediately. Similarly, for hosts such as web servers, we may not be concerned by a response time of more than a second for a page on a busy budget shared web host for customers. But if the response time for the corporate website or a dedicated colocation customer was getting that bad, it might well be something to notify the web server administrator about.

Hosts and services are therefore not all created equal. Nagios Core provides several ways to define behaviors with more precision, as follows:

  • How often a host or service should be checked with its appropriate check_command plugin
  • How bad a check's results have to be before a WARNING or CRITICAL problem is flagged, if at all
  • Defining a downtime period for a host or service, so that Nagios Core knows not to expect it to operate during a specified period of time, often for upgrades or other maintenance
  • Whether to automatically tolerate flapping, or hosts and services seeming to go up and down a lot

This chapter will use some common instances of problems with the preceding behaviors to give examples showing how to configure them.