Posts Tagged ‘efficiency’

Rakudo and Speed

June 30, 2010

Saw a post / answer on Stack Overflow reporting a segmentation fault in Rakudo. Looking into it, I rewrote the code like this so it actually worked on the current build:

my %words;

for lines.split(/\s+/) { 
    %words{$_}++ 
}

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair
}

(This relies on the $*ARGFILES patch from yesterday, making lines basically the equivalent of while (<>) from Perl 5.)

That works, but takes 80 seconds to run on a 100 line source file on my MacBook Pro. That seems terribly slow. But for once, it’s only sort of the fault of Rakudo!

The first thing that made me suspicious was lines.split. lines returns a List of Str. How does .split, which takes a single Str as input, even work on that? Well, like many Cool methods, .split works on most fundamental Perl types; it just stringifies them first. So lines.split reads in the file line-by-line, then combines them into one enormous string.

So, let’s try replacing that with slurp.split, which reads the entire input into a single Str and then splits that. Errr, and immediately I run into problems — it doesn’t want to default slurp to anything. Weird, I would swear that was working a moment ago, but no matter. Replacing it with $*IN.slurp.split makes the code work drastically faster: 7.5 seconds. (Using $*IN.lines.split actually speeds up that version a little bit too, to 73 seconds.)

You can still get it a bit faster. words is actually the preferred way of getting all the non-whitespace bits out of a string in Perl 6 — it’s asking for what you want, rather than trying to specify what you don’t want. And it turns out it’s faster, too: $*IN.slurp.words makes the code execute in 7 seconds, shaving .5 seconds off from our best split version.

So, the good news here is that Rakudo is actually usably fast on this task. Yay! Here’s the faster version:

my %words;

for $*IN.slurp.words { 
    %words{$_}++ 
}

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair
}

The bad news is, Rakudo seems to have a pretty serious performance issue in the operation of stringifying the List of Str. I did a bit of timing, and 50 strings took 5.4 seconds, 100 took 17 seconds, and 200 took 77 seconds. At a very rough approximation, that’s O(N^2). Masak points out that this is an example of Schlemiel the Painter’s algorithm.


Follow

Get every new post delivered to your Inbox.