Python Line-by-line Profiler (line_profiler and kernprof)

The following is a quick and dirty guide to getting started with line_profiler, a Python line-by-line profiler, on Fedora.

  1. Build and install the python-line_profiler package
  2. Create a file called test.py with the code below

    import random, time
    
    def sleep():
        seconds = random.randint(0, 5)
        print 'Sleeping %s seconds' % seconds
        time.sleep(seconds)
    
    @profile
    def test():
        sleep()
        sleep()
        sleep()
    
    test()
    
  3. Profile test.py

    [silas@silas ~]$ kernprof.py -l test.py
    Sleeping 4 seconds
    Sleeping 5 seconds
    Sleeping 2 seconds
    Wrote profile results to test.py.lprof
    
  4. View the results

    [silas@silas ~]$ python -m line_profiler test.py.lprof
    Timer unit: 1e-06 s
    
    File: test.py
    Function: test at line 8
    Total time: 10.9994 s
    
    Line #      Hits         Time  Per Hit   % Time  Line Contents
    ==============================================================
         8                                           @profile
         9                                           def test():
        10         1      3999416 3999416.0     36.4      sleep()
        11         1      4999982 4999982.0     45.5      sleep()
        12         1      1999990 1999990.0     18.2      sleep()
    

NOTE: I have a package review up for line_profiler and it should be available via yum eventually.

Pipe Apache (or any) Logs to Scribe

I created a simple Python script called scribe_log to tail a log file and pipe it to Scribe.

I use supervisor to start and keep the pipe running.

Relevant supervisord.conf configuration:

[program:scribe.apache.access]
command=/usr/local/sbin/scribe_log --category apache.access --file /var/log/httpd/access_log

Options

usage: scribe_log [options]

options:
  -h, --help           show this help message and exit
  --file=FILE          file to tail into Scribe
  --category=CATEGORY  Scribe category
  --host=HOST          destination Scribe host server
  --port=PORT          destination Scribe port
  --prefix=PREFIX      add to the beginning of each log line
  --postfix=POSTFIX    add to the end of each log line

Scribe – Scalable Real Time Log Aggregation for CentOS 5 / RHEL 5

"Scribe is a server for aggregating log data streamed in real time from a large number of servers. It is designed to be scalable, extensible without client-side modification, and robust to failure of the network or any specific machine. Scribe was developed at Facebook and released as open source."

I've packaged Thrift, fb303 and Scribe for CentOS 5 / RHEL 5.

SRPM

Scribe depends on fb303 which in turns depends on Thrift so you'll need to setup a local repository and build them in that order. To build the above packages for Fedora 9+ you'll need to tweak the Python sub-packages to include the egg files.

Both fb303 and Scribe need some tweaks upstream before they're suitable for a package review, but I'd like to get them in Fedora shortly (I've already submitted Thrift).

If you'd like to hack around with my latest Fedora specs you can grab them here.

Managing Large Networks with Puppet

Puppet is an open source configuration management tool written in Ruby. It allows a systems administrator to define how a system should be configured using Puppet's declarative language. Each Puppet client pulls its catalog at a regular interval and figures out how to make the catalog definitions true for the local operating system.

The Puppet Introduction uses the following diagram to show how Puppet works.

Puppet

Currently Puppet is most useful when you have lots of nodes with similar setups. Unfortunately Puppet's declarative language is a bit weak when it comes to inheritance and users familiar with true object oriented systems will soon become frustrated (at least I did). I'm assuming this issue will be addressed in the future, but for now I'm going to tell you how I setup a 100+ node system while adhering to DRY principles.

A quick note, I'm assuming you already know how to use Puppet.

First lets start with the Puppetmaster configuration layout:

puppet/
 - fileserver.conf
 - manifests/
   - classes/
     - initialize.pp
   - nodes/
     - net/
       - example/
         - web01.pp   # web01.example.net
   - site.pp
   - templates.pp
 - modules/           # application specific modules
   - httpd/
     - files/         # static assets (ex: default HTML file)
     - manifests/     # application configuration logic
     - templates/     # dynamic configuration files (ex: httpd.conf)

The layout was adapted from recommendations in Pulling Strings with Puppet by James Turnbull.

When I was initially designing my Puppet setup the common practice was to define and initialize the various Puppet types and classes throughout the inheritance tree. So if you had a generic resolv.conf configuration you would include it at the top of the inheritance tree. This worked great until I needed to change one attribute of a class further down. I initially tackled this problem by hacking in if/case statements, but eventually it became unmanageable.

After a couple of rewrites I came up with the idea of wrapping the internals of each class in a conditional statement and initializing all classes at the end of each node.

The key components of my setup were:

  • import all modules in the site.pp manifest
  • wrap the code of each class in a conditional statement
  • define and extend attributes throughout the inheritance tree (including the class conditional)
  • include all classes at the end of each node

This lets you both arbitrarily redefine or extend class attributes (including the on/off state) throughout the inheritance tree.

I've created a simple example to show how this setup works. The main files are listed below (you'll want to go through each in order):

Also note that I designed this setup about 8 months ago and I haven't kept as up-to-date on recent developments as I should. If there is a better way to do this please let me know.