Implementing Splunk 7(Third Edition)
上QQ阅读APP看书,第一时间看更新

Using stats to aggregate values

While top is very convenient, stats is extremely versatile. The basic structure of a stats statement is:

stats functions by fields 

Many of the functions available in stats mimic similar functions in SQL or Excel, but there are many functions unique to Splunk too. The simplest stats function is count. Given the following query, the results will contain exactly one row, with a value for the field count:

sourcetype=tm1* error | stats count

Using the by clause, stats will produce one row per unique value for each field listed, which is similar to the behavior of top. Run the following query:

sourcetype=tm1* error | stats count by date_month date_wday

It will produce a table like this:

There are a few things to note about these results:

  • The results are sorted against the values of the by fields, in this case, date_month followed by date_wday. Unlike top, the largest value will not necessarily be at the top of the list. You can sort in the GUI simply by clicking on the field names at the top of the table, or by using the sort command.
  • There is no limit to the number of rows produced. The number of rows will equal all possible combinations of field values.
  • The function results are displayed last. In the next example, we will add a few more functions, and this will become more obvious.

Using stats, you can add as many by fields or functions as you want into a single statement. Let's run this query:

sourcetype=tm1* error | stats count avg(linecount) max(linecount) 
as "Slowest Time" by date_month date_wday

The results look like those in the following screenshot:

Let's step through every part of this query, just to be clear:

  • sourcetype=tm1* error is the query itself
  • | stats starts the stats command
  • count will return the number of events
  • avg(linecount) produces an average value of the linecount field
  • max(linecount) as Slowest Time finds the maximum value of the linecount field and places the value in a field called Slowest Time

The quotes are necessary ("Slowest Time") because the field name contains a space:

  • by indicates that we are done listing functions and want to list the fields to slice the data by. If the data does not need to be sliced, by and the fields following it can be omitted.
  • date_month and date_wday are our fields for slicing the data. All functions are actually run against each set of data produced for every possible combination of date_month and date_user.

If an event is missing a field that is referenced in a stats command, you may not see the results you are expecting. For instance, when computing an average, you may wish for events missing a field to count as zeros in the average. Also, for events that do not contain a field listed in the by fields, the event will simply be ignored.

To deal with both of these cases, you can use the fillnull command to make sure that the fields you want exist. We will cover this in Chapter 6, Advanced Search Examples.

Let's look at another example, using a time-based function and a little trick. Let's say we want to know the most recent time at which a particular user saw an error each day.

We can use the following query:

sourcetype=tm1* Error TheUser="Admin" | stats count 
first(date_wday) max(_time) as _time by source

This query produces the following table:

Let's step through this example:

  • sourcetype=tm1* Error TheUser="Admin" is the query that will find all errors logged by the user "Admin".
  • | stats is our command.
  • count shows how many times this user saw an error each day.
  • first(date_wday) gives us the weekday that was most recently logged for this user. This will be the most recent event, since results are returned in the order of the most recent first.
  • max(_time) as _time returns the time at which the user most recently saw an error that day. This takes advantage of three aspects of time in Splunk:
    • _time is always present in raw events. As discussed in Chapter 2, Understanding Search, the value is the number of seconds since January 1, 1970, UTC.
    • _time is stored as a number and can be treated as such.
    • If there is a field called _time in the results, Splunk will always display the value as the first column of a table in the time zone selected by the user.
  • by source is our field to split the results against, and in this example, it is done by data source or the error log file where the error(s) were found.

We have only seen a few functions in stats. There are dozens of functions and some advanced syntax that we will touch upon in later chapters. The simplest way to find the full listing is to search with your favorite search engine for the Splunk stats functions.