{"id":7594,"date":"2005-08-22T17:22:00","date_gmt":"2005-08-22T21:22:00","guid":{"rendered":"https:\/\/nuxx.net\/blog\/2005\/08\/22\/apache-logfile-analysis\/"},"modified":"2026-07-01T11:33:36","modified_gmt":"2026-07-01T15:33:36","slug":"apache-logfile-analysis","status":"publish","type":"post","link":"https:\/\/nuxx.net\/blog\/2005\/08\/22\/apache-logfile-analysis\/","title":{"rendered":"Apache Logfile Analysis&#8230;"},"content":{"rendered":"<p>(First, this is a test post from Semagic behind the work firewall&#8230;)<\/p>\n<p>Okay, so I think I&#8217;ve got a good way of processing the logs from websites that I host. I&#8217;m going to document how I do it here for myself and anyone else who might happen to care.<\/p>\n<p>First off, I&#8217;m using <a href=\"http:\/\/httpd.apache.org\/\">Apache 2.0<\/a>, <a href=\"http:\/\/www.cronolog.org\/\">cronolog<\/a> to handle log files (essentially rotation&#8230; but not), and <a href=\"http:\/\/www.mrunix.net\/webalizer\/\">Webalizer<\/a> to parse the files and provide reports.<\/p>\n<p>First, each Apache vhost is logged via a pipe to <font face=\"courier\">cronolog<\/font> so that it makes subdirectories for Year and Month, naming the file with Year-Month-Day-access.log. For example, today&#8217;s file as logged below would be <font face=\"courier\">\/var\/data\/wwwlogs\/default\/2005\/08\/2005-08-22-access.log<\/font>. This is done by <font face=\"courier\">CustomLog<\/font> and <font face=\"courier\">ErrorLog<\/font> lines for each vhost which are similar to this:<\/p>\n<blockquote><p><font face=\"courier\">CustomLog &#8220;|\/usr\/local\/sbin\/cronolog \/var\/data\/wwwlogs\/default\/%Y\/%m\/%Y-%m-%d-access.log&#8221; combined<br \/>\nErrorLog  &#8220;|\/usr\/local\/sbin\/cronolog \/var\/data\/wwwlogs\/default\/%Y\/%m\/%Y-%m-%d-errors.log&#8221;<\/font><\/p><\/blockquote>\n<p>That alone will make it easier to look through logfiles, should the need arise.<\/p>\n<p>Next, a script is set to run once per night which first runs Webazolver on all the log files in order to build a cache of resolved hostnames and then runs Webalizer itself, using that cache. File processing is done with the <font face=\"courier\">-p<\/font> argument in order to provide incremental parsing. As each log file <em>should<\/em> only be parsed once, the end result should be that each day&#8217;s set of data is added to the collective reports.<\/p>\n<p>Here is the script which I call <font face=\"courier\">run_webalizer.sh<\/font> and will run via cron each morning (around 12:30am local time, I&#8217;d imagine):<\/p>\n<blockquote><p><font face=\"courier\">#!\/bin\/sh<\/p>\n<p># First, run webazolver to resolve all IPs<br \/>\nfor i in \/var\/data\/wwwlogs\/*<br \/>\n  do CURLOG=`date -v-1d +&#8221;$i\/%Y\/%m\/%Y-%m-%d-access.log&#8221;`<br \/>\n  \/usr\/local\/bin\/webazolver -Q -p -N 10 -D \/var\/db\/webalizer_cache.db $CURLOG<br \/>\ndone<\/p>\n<p># Run webalizer with all config files&#8230;<br \/>\nfor i in \/var\/data\/wwwlogs\/*<br \/>\n  do VHOST=`echo $i | cut -f5 -d\\\/`<br \/>\n  CURLOG=`date -v-1d +&#8221;$i\/%Y\/%m\/%Y-%m-%d-access.log&#8221;`<br \/>\n  \/usr\/local\/bin\/webalizer -Q -p -n $VHOST -o \/var\/data\/www\/admin\/webalizer\/$VHOST -D \/var\/db\/webalizer_cache.db -N 10 -r $VHOST\\\/ -s \\*$VHOST $CURLOG<br \/>\ndone<\/font><\/p><\/blockquote>\n<p>Well, hopefully some people will find this useful. It appears that it&#8217;ll work fine for now. I guess I&#8217;ll know for sure after a few days&#8230;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>(First, this is a test post from Semagic behind the work firewall&#8230;) Okay, so I think I&#8217;ve got a good way of processing the logs from websites that I host. I&#8217;m going to document how I do it here for myself and anyone else who might happen to care. First off, I&#8217;m using Apache 2.0,\u2026<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[13,34],"tags":[],"class_list":["post-7594","post","type-post","status-publish","format-standard","hentry","category-computers","category-moved-from-livejournal"],"_links":{"self":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/7594","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/comments?post=7594"}],"version-history":[{"count":1,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/7594\/revisions"}],"predecessor-version":[{"id":14034,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/posts\/7594\/revisions\/14034"}],"wp:attachment":[{"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/media?parent=7594"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/categories?post=7594"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/nuxx.net\/blog\/wp-json\/wp\/v2\/tags?post=7594"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}