
Learning to log for robust error checking
Notebooks are useful to keep track of what you did and what went wrong. Logging works in a similar fashion, and we can log errors and other useful information with the standard Python logging
library.
For reproducible data analysis, it is good to know the modules our Python scripts import. In this recipe, I will introduce a minimal API from dautil
that logs package versions of imported modules in a best effort manner.
Getting ready
In this recipe, we import NumPy and pandas, so you may need to import them. See the Configuring pandas recipe for pandas installation instructions. Installation instructions for NumPy can be found at http://docs.scipy.org/doc/numpy/user/install.html (retrieved July 2015). Alternatively, install NumPy with pip using the following command:
$ [sudo] pip install numpy
The command for Anaconda users is as follows:
$ conda install numpy
I have installed NumPy 1.9.2 via Anaconda. We also require AppDirs
to find the appropriate directory to store logs. Install it with the following command:
$ [sudo] pip install appdirs
I have AppDirs 1.4.0 on my system.
How to do it...
To log, we need to create and set up loggers. We can either set up the loggers with code or use a configuration file. Configuring loggers with code is the more flexible option, but configuration files tend to be more readable. I use the log.conf
configuration file from dautil
:
[loggers] keys=root [handlers] keys=consoleHandler,fileHandler [formatters] keys=simpleFormatter [logger_root] level=DEBUG handlers=consoleHandler,fileHandler [handler_consoleHandler] class=StreamHandler level=INFO formatter=simpleFormatter args=(sys.stdout,) [handler_fileHandler] class=dautil.log_api.VersionsLogFileHandler formatter=simpleFormatter args=('versions.log',) [formatter_simpleFormatter] format=%(asctime)s - %(name)s - %(levelname)s - %(message)s datefmt=%d-%b-%Y
The file configures a logger to log to a file with the DEBUG
level and to the screen with the INFO
level. So, the logger logs more to the file than to the screen. The file also specifies the format of the log messages. I created a tiny API in dautil
, which creates a logger with its get_logger()
function and uses it to log the package versions of a client program with its log()
function. The code is in the log_api.py
file of dautil
:
from pkg_resources import get_distribution from pkg_resources import resource_filename import logging import logging.config import pprint from appdirs import AppDirs import os def get_logger(name): log_config = resource_filename(__name__, 'log.conf') logging.config.fileConfig(log_config) logger = logging.getLogger(name) return logger def shorten(module_name): dot_i = module_name.find('.') return module_name[:dot_i] def log(modules, name): skiplist = ['pkg_resources', 'distutils'] logger = get_logger(name) logger.debug('Inside the log function') for k in modules.keys(): str_k = str(k) if '.version' in str_k: short = shorten(str_k) if short in skiplist: continue try: logger.info('%s=%s' % (short, get_distribution(short).version)) except ImportError: logger.warn('Could not impport', short) class VersionsLogFileHandler(logging.FileHandler): def __init__(self, fName): dirs = AppDirs("PythonDataAnalysisCookbook", "Ivan Idris") path = dirs.user_log_dir print(path) if not os.path.exists(path): os.mkdir(path) super(VersionsLogFileHandler, self).__init__( os.path.join(path, fName))
The program that uses the API is in the log_demo.py
file in this book's code bundle:
import sys import numpy as np import matplotlib.pyplot as plt import pandas as pd from dautil import log_api log_api.log(sys.modules, sys.argv[0])
How it works...
We configured a handler (VersionsLogFileHandler
) that writes to file and a handler (StreamHandler
) that displays messages on the screen. StreamHandler
is a class in the Python standard library. To configure the format of the log messages, we used the SimpleFormater
class from the Python standard library.
The API I made goes through modules listed in the sys.modules
variable and tries to get the versions of the modules. Some of the modules are not relevant for data analysis, so we skip them. The log()
function of the API logs a DEBUG
level message with the debug()
method. The info()
method logs the package version at INFO
level.
See also
- The logging tutorial at https://docs.python.org/3.5/howto/logging.html (retrieved July 2015)
- The logging cookbook at https://docs.python.org/3.5/howto/logging-cookbook.html#logging-cookbook (retrieved July 2015)