View Full Version : logmerge - sort and merge chronologically any kind of logfiles

05-30-2012, 03:27 AM
I'd like to share the simple and useful tool that was developed for internal usage. But during its development I found that it can be used the more wide. So if it can be useful for one it can be useful for others. I think so. The tool itself can be found here - http://code.google.com/p/logmerge/.

We have a complex java application which consists of several processes writing each events in their own logfile. Each logfile is a plain text file. Commonly each entry in the file is a line containing a timestamp at the beginning of a line and a record about events. Some times an application writes more than one line per an entry. This happens when a process raises an exception. In that case the stacktrace will appear in the logfile. Usually or always the stacktrace is populated on a few lines.

Sometimes we need to investigate the series of logfiles together to understand what happens there. The better way is to consolidate these files into single file and investigate it. But (as I said) one logentry is not one line in a logfile. So when we even merge and sort a couple of files we don't sure that the resulting file will be sorted correctly including multiline entries.

Of course, before starting my development I have made some investigations and found that majority of tools are not free or doesn't satisfy our wishes. So I was inspired to develop the tool that is able to merge and correctly sort in the chronological order all logentries of several logfiles. Firstly I decided to implement the tool for unix-like systems only. It was tested both Linux and Windows+Cygwin and works properly there. It requires the standard shell and perl that always exist by default on Unix systems and withn Cygwin if it was chosen there. Maybe I will try to implement the tool for the pure Windows (without Cygwin, java and others).

When I finished my work Ii found that this tool can be extended for any kind of logfiles. At least you need to describe the timestamp format within the file. The next important thing is the action that transforms the present format to the sortable one. There are simple examples below. Mostly these example are oriented on web administrators and someone close to this.

Merging of the Apache error logfiles

./logmerge --apache-error ./error.log* > all.log
Merge all error.log* Apache error files in the current directory and store them in the single consolidated file. The input files can be both plain text files and gziped ones. The toll consider that each logentry is defined as below.

[Fri Apr 23 22:14:21 2010] <the rest of the entry>

Merge of Apache access logfiles

find /export/home/ -name 'access.log' | xargs ./logmerge -f -n --apache-access > all.log
All available access.log Apache access files will be merged and sorted to the resulting file. Each line in this file with be started by the filename of the original file and the number of line in there. And the toll assume that the timestamp is presented in the format as below:

<the begin of the entry> [15/Feb/2008:14:18:49 +0300] <the rest of the entry>

Merge all files and store them to the archive

./logmerge -f -n log/*.log | gzip -c > all.gz
Merge all files located within the log/ directory and pass the result to archive. The filename and the line number will be added at the beginning of each line in the resulting file. By default the utility assumes that each logentry begins with a timestamp and can occupy more than a single line (e.g.: Java's stack traces like below):

05/21/2012 21:54:41.070 <the rest of the entry>
at boo.hoo.StackTrace.bar(StackTrace.java:223)
at boo.hoo.StackTrace.foo(StackTrace.java:218)
at boo.hoo.StackTrace.main(StackTrace.java:54)

So, if you are interested in using of the tool just find it following this link (it is not advert, I just want to help you to remember why you are reading this article - http://code.google.com/p/logmerge/).

Comments and suggestions are welcome.

06-22-2012, 08:10 PM
I have updated the script. If somebody is interested in this script he can download the latest version following by the link above. Threre are important changes of the script:
-- the script interpreter wa modified on Bash instead old shell still existing on some systems;
-- reading of .gz/.bz2 archives was added;
-- correct processing of arguments -- input files.