So, matthias’s results in masak’s p5 challenge had me utterly baffled. Anyone with much Rakudo programming experience knows that something like
gather for @strA[$i..*-1] Z @strB[$j..*-1] -> $a, $b
in your inner loop is DOOM speed-wise. How could that possibly be a world-beater?
Well, the answer is that it all depends on the strings you are searching for common substrings. I’d asked masak about what test he was using a few days ago (trying to figure out the weird performance of my code) and gotten his routine:
my $n = 160; my $a = <0 1 2 3 4>.roll($n).join; my $b = <5 6 7 8 9>.roll($n).join; # "N" case is testing this $a versus this $b my $ra = (0..$n - 10).pick; my $rb = (0..$n - 10).pick; $a = $a.substr(0, $ra) ~ 'flirtation' ~ $a.substr($ra + 10); $b = $b.substr(0, $rb) ~ 'flirtation' ~ $b.substr($rb + 10); # "Y" case is testing this $a versus this $b
If you look at that, what should jump out at you is, in the “N” case, the two strings have no characters at all in common with each other! matthias’s code is nearly perfectly optimized for this case, as the slow inner loop will never execute. The “Y” case will be nearly as good for it.
So, what happens if we use normal text to seed this? Luckily I just got some benchmarking code going. So if the strings are
There were nobles, who made war against each other; there was the king, who made war against the cardinal; there was Spain, which made war against the king.
what are the timings like? Sorted from best to worse:
p5-fox.pl: 9.0981144 p5-moritz.pl: 10.4469054 p5-colomon.pl: 12.5152087 p5-matthias.pl: 25.9988418 p5-util.pl: 32.1561981
More strings and timings coming soon….
PS I should add I love the style of matthias’s code. It combines a lot of elegance with a clever Perl-ish and p6-ish approach to the problem. If I hadn’t stumbled across the suffix tree solution, I suspect my code would have looked a lot like an uglier version of his.