Apache Log Sorter

Today at work I needed to sort about 500MB of Apache2 log files to feed into awstats. They needed sorting since they were from many archived logs, and I couldn't be bothered putting them in order. I didn't realise what a mission it would be... awstats includes a perl script that's supposed to merge and sort entries, but it failed to do it. I also tried 'mergesort' from apt, but it didn't do it right either. grr! So I wrote my own in PHP. Basically it reads in a file that contains all the logs catted together, and eventually outputs the sorted log to stdout. You'll need to increase PHP's memory limit if you plan on using it with large files. I gave it 1GB (the amount of physical RAM in the machine) and it happily crunched 500MB of logs reasonably quickly. I suspect it only needs about 20MB, basically enough to store the date for each line in the file (it doesn't load the whole file in, just the dates)

<code><?php
$filename
= "inputlog.log";

$infile = fopen($filename, "r");
$dates = array();
$curline = 0;
$linetopos = array();

while(!
feof($infile)) {
    
$linetopos[$curline + 1] = ftell($infile);
    
$line = fgets($infile);
    
$curline++;
    list(
$null, $thing) = explode('[', $line);
    list(
$date, $null) = explode(']', $thing);
    
    list(
$date, $time1, $time2, $time3) = explode(':', $date);
    
$time = "$time1:$time2:$time3";
    
    
$date = str_replace("/", " ", $date);
    
$dates[strtotime($date.' '.$time)] = $curline;
}

ksort($dates);

fseek($infile, 0);

foreach(
$dates as $line) {
    
fseek($infile, $linetopos[$line], SEEK_SET);
    print
fgets($infile);
}
fclose($infile);

?>
</code>