Saw a post / answer on Stack Overflow reporting a segmentation fault in Rakudo. Looking into it, I rewrote the code like this so it actually worked on the current build:
my %words; for lines.split(/\s+/) { %words{$_}++ } for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair { say $pair }
(This relies on the $*ARGFILES
patch from yesterday, making lines basically the equivalent of while (<>)
from Perl 5.)
That works, but takes 80 seconds to run on a 100 line source file on my MacBook Pro. That seems terribly slow. But for once, it’s only sort of the fault of Rakudo!
The first thing that made me suspicious was lines.split
. lines
returns a List of Str. How does .split
, which takes a single Str as input, even work on that? Well, like many Cool methods, .split
works on most fundamental Perl types; it just stringifies them first. So lines.split
reads in the file line-by-line, then combines them into one enormous string.
So, let’s try replacing that with slurp.split
, which reads the entire input into a single Str and then splits that. Errr, and immediately I run into problems — it doesn’t want to default slurp to anything. Weird, I would swear that was working a moment ago, but no matter. Replacing it with $*IN.slurp.split
makes the code work drastically faster: 7.5 seconds. (Using $*IN.lines.split
actually speeds up that version a little bit too, to 73 seconds.)
You can still get it a bit faster. words
is actually the preferred way of getting all the non-whitespace bits out of a string in Perl 6 — it’s asking for what you want, rather than trying to specify what you don’t want. And it turns out it’s faster, too: $*IN.slurp.words
makes the code execute in 7 seconds, shaving .5 seconds off from our best split version.
So, the good news here is that Rakudo is actually usably fast on this task. Yay! Here’s the faster version:
my %words; for $*IN.slurp.words { %words{$_}++ } for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair { say $pair }
The bad news is, Rakudo seems to have a pretty serious performance issue in the operation of stringifying the List of Str. I did a bit of timing, and 50 strings took 5.4 seconds, 100 took 17 seconds, and 200 took 77 seconds. At a very rough approximation, that’s O(N^2)
. Masak points out that this is an example of Schlemiel the Painter’s algorithm.