## Benchmarking p5

So, matthias’s results in masak’s p5 challenge had me utterly baffled. Anyone with much Rakudo programming experience knows that something like

gather for @strA[\$i..*-1] Z @strB[\$j..*-1] -> \$a, \$b

in your inner loop is DOOM speed-wise. How could that possibly be a world-beater?

Well, the answer is that it all depends on the strings you are searching for common substrings. I’d asked masak about what test he was using a few days ago (trying to figure out the weird performance of my code) and gotten his routine:

my \$n = 160;
my \$a = <0 1 2 3 4>.roll(\$n).join;
my \$b = <5 6 7 8 9>.roll(\$n).join;

# "N" case is testing this \$a versus this \$b

my \$ra = (0..\$n - 10).pick;
my \$rb = (0..\$n - 10).pick;
\$a = \$a.substr(0, \$ra) ~ 'flirtation' ~ \$a.substr(\$ra + 10);
\$b = \$b.substr(0, \$rb) ~ 'flirtation' ~ \$b.substr(\$rb + 10);

# "Y" case is testing this \$a versus this \$b

If you look at that, what should jump out at you is, in the “N” case, the two strings have no characters at all in common with each other! matthias’s code is nearly perfectly optimized for this case, as the slow inner loop will never execute. The “Y” case will be nearly as good for it.

So, what happens if we use normal text to seed this? Luckily I just got some benchmarking code going. So if the strings are

There were nobles, who made war against each other; there was the king,
who made war against the cardinal; there was Spain, which made war against the king.

what are the timings like? Sorted from best to worse:

p5-fox.pl: 9.0981144
p5-moritz.pl: 10.4469054
p5-colomon.pl: 12.5152087
p5-matthias.pl: 25.9988418
p5-util.pl: 32.1561981

More strings and timings coming soon….

PS I should add I love the style of matthias’s code. It combines a lot of elegance with a clever Perl-ish and p6-ish approach to the problem. If I hadn’t stumbled across the suffix tree solution, I suspect my code would have looked a lot like an uglier version of his.