Saturday, July 11, 2020

Perl Faster than Grep

So, I've been crawling through a debug log file that is 195 million lines long. I've been using a lot of "grep | wc" to count numbers of various log messages. Here's some timings for my Macbook Pro:

$ time cat dbglog.txt >/dev/null
real 0m35.423s

$ time wc dbglog.txt
195177935 1177117603 28533284864 dbglog.txt
real 1m44.560s

$ time egrep '999999' dbglog.txt
real 7m39.737s

(For this timing, I chose a pattern that would *NOT* be found.)

On the Macbook, the man page for fgrep claims that it is faster than grep. Let's see:

$ time fgrep '999999' dbglog.txt
real 7m11.365s

Well, I guess it's a little faster, but nothing to brag about.

Then I wanted to create a histogram of some findings, so I wrote a perl script to scan the file and create the histogram. Since it performed regular expression matching on every line, I assumed it would be a little slower than grep, since Perl is an interpreted language.

$ time ./ dbglog.txt >count.out
real 3m9.427s

WOW! Less than half the time!

So I created a simple grep replacement: It doesn't do any histogramming, so it should be even faster.

$ time '999999' dbglog.txt
real  2m8.341s

Amazing. Perl grep runs in less than a third the time of grep.

For small files, I bet Perl grep is slower starting up. Let's see.

$ time echo "hi" | grep 9999
real        0m0.051s

$ time echo "hi" | 9999
real        0m0.113s

Yep. Grep saves you about 60 milliseconds. So if you had thousands of small files to grep, it might be faster to use grep.


Sahir said...

If you like the idea of grepping with Perl, check out Ack, which is a full featured grep replacement written in Perl.

If you are interested in a faster grep though, ripgrep (rg) is hard to beat! Due to many various tweaks including multithreading, a regex engine that can use SIMD instructions, and built from the ground up unicode support, rg ends up being faster than grep and ack in most cases.

Steve Ford said...

Thanks! I benefitted already from looking at Ack; I didn't know about the File::Next utility, which will easily give me recursive input files. It doesn't integrate with the diamond operator, which is a pity, but will probably be worth it.

BTW, are you the Sahir I've worked with?

Sahir said...

> BTW, are you the Sahir I've worked with?

The very same!