nuxx.net
Making, baking, and (un-)breaking things in Southeast Michigan.

Apache Logfile Analysis…

(First, this is a test post from Semagic behind the work firewall…)

Okay, so I think I’ve got a good way of processing the logs from websites that I host. I’m going to document how I do it here for myself and anyone else who might happen to care.

First off, I’m using Apache 2.0, cronolog to handle log files (essentially rotation… but not), and Webalizer to parse the files and provide reports.

First, each Apache vhost is logged via a pipe to cronolog so that it makes subdirectories for Year and Month, naming the file with Year-Month-Day-access.log. For example, today’s file as logged below would be /var/data/wwwlogs/default/2005/08/2005-08-22-access.log. This is done by CustomLog and ErrorLog lines for each vhost which are similar to this:

CustomLog “|/usr/local/sbin/cronolog /var/data/wwwlogs/default/%Y/%m/%Y-%m-%d-access.log” combined
ErrorLog “|/usr/local/sbin/cronolog /var/data/wwwlogs/default/%Y/%m/%Y-%m-%d-errors.log”

That alone will make it easier to look through logfiles, should the need arise.

Next, a script is set to run once per night which first runs Webazolver on all the log files in order to build a cache of resolved hostnames and then runs Webalizer itself, using that cache. File processing is done with the -p argument in order to provide incremental parsing. As each log file should only be parsed once, the end result should be that each day’s set of data is added to the collective reports.

Here is the script which I call run_webalizer.sh and will run via cron each morning (around 12:30am local time, I’d imagine):

#!/bin/sh

# First, run webazolver to resolve all IPs
for i in /var/data/wwwlogs/*
do CURLOG=`date -v-1d +”$i/%Y/%m/%Y-%m-%d-access.log”`
/usr/local/bin/webazolver -Q -p -N 10 -D /var/db/webalizer_cache.db $CURLOG
done

# Run webalizer with all config files…
for i in /var/data/wwwlogs/*
do VHOST=`echo $i | cut -f5 -d\/`
CURLOG=`date -v-1d +”$i/%Y/%m/%Y-%m-%d-access.log”`
/usr/local/bin/webalizer -Q -p -n $VHOST -o /var/data/www/admin/webalizer/$VHOST -D /var/db/webalizer_cache.db -N 10 -r $VHOST\/ -s \*$VHOST $CURLOG
done

Well, hopefully some people will find this useful. It appears that it’ll work fine for now. I guess I’ll know for sure after a few days…

Leave a reply