Archive for June, 2010

Rakudo and Speed

June 30, 2010

Saw a post / answer on Stack Overflow reporting a segmentation fault in Rakudo. Looking into it, I rewrote the code like this so it actually worked on the current build:

my %words;

for lines.split(/\s+/) { 

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair

(This relies on the $*ARGFILES patch from yesterday, making lines basically the equivalent of while (<>) from Perl 5.)

That works, but takes 80 seconds to run on a 100 line source file on my MacBook Pro. That seems terribly slow. But for once, it’s only sort of the fault of Rakudo!

The first thing that made me suspicious was lines.split. lines returns a List of Str. How does .split, which takes a single Str as input, even work on that? Well, like many Cool methods, .split works on most fundamental Perl types; it just stringifies them first. So lines.split reads in the file line-by-line, then combines them into one enormous string.

So, let’s try replacing that with slurp.split, which reads the entire input into a single Str and then splits that. Errr, and immediately I run into problems — it doesn’t want to default slurp to anything. Weird, I would swear that was working a moment ago, but no matter. Replacing it with $*IN.slurp.split makes the code work drastically faster: 7.5 seconds. (Using $*IN.lines.split actually speeds up that version a little bit too, to 73 seconds.)

You can still get it a bit faster. words is actually the preferred way of getting all the non-whitespace bits out of a string in Perl 6 — it’s asking for what you want, rather than trying to specify what you don’t want. And it turns out it’s faster, too: $*IN.slurp.words makes the code execute in 7 seconds, shaving .5 seconds off from our best split version.

So, the good news here is that Rakudo is actually usably fast on this task. Yay! Here’s the faster version:

my %words;

for $*IN.slurp.words { 

for %words.pairs.sort: { $^b.value <=> $^a.value } -> $pair {
    say $pair

The bad news is, Rakudo seems to have a pretty serious performance issue in the operation of stringifying the List of Str. I did a bit of timing, and 50 strings took 5.4 seconds, 100 took 17 seconds, and 200 took 77 seconds. At a very rough approximation, that’s O(N^2). Masak points out that this is an example of Schlemiel the Painter’s algorithm.


Detour from Sin

June 26, 2010

I spent the first two days of YAPC furiously working on overhauling the trig tests every chance I got. On Wednesday, however, I suddenly changed my focus, and I’ve been working on something else ever since. What happened?

Simple: my improved trig tests ran up hard against a major Rakudo limitation. So I’ve been working on fixing it so I can get back to working on those tests. The issue was this: on a 32-bit system, you could only enter about nine digits in your Num constants. The tenth digit would make the action for handling a Rat or Num blow up, and either give an error or incorrect results.

This is something of a problem when you are automatically generating code, because writing a Num usually gets you 15 digits.

So, I set out to fix Rakudo’s Rat / Num constant handling. First I hand wrote a parser, which I ended up not using. Then I coded up a a beautiful functional approach to handling converting the string components to a number. Something like this (for Rat):

$result = [+] $int-part.comb(/\d/)> $i, $d { $d.Int * 10 ** $i });
$result += [+] $frac-part.comb(/\d/)> $i, $d { $d.Int / 10 ** ($i + 1) }); 

(Note: reconstructed from memory, may not actually work as given.) This worked great, and had the happy property of properly returning a Rat if possible, and promoting to a Num if needed for additional precision. (Basically, because all the basic components of the math are simple ints, and the math operations correctly promote as needed.)

There was only one problem with this approach: it took eight times as long to do its job as the original. I tried rewriting it in a more iterative fashion. Then it only took seven times as long. Still not acceptable.

I toyed around with other ideas for reworking the solution, but finally, at about the fourth hour of my drive home from YAPC, I hit on the problem with this approach. Every one of those simple math operations was necessarily a multi call. Three of those were required for each digit. No wonder it was slow!

That in mind, I decided on a new approach. The basic problem with the original Rakudo approach was that it treated the various components of the numeric constant as Ints, which can easily (and sometimes ungracefully) overflow. But there is a standard Rakudo trick for handling overflow in Ints gracefully: do the math internally as a Num, and then cast the result to Int only if possible. This can be done entirely in PIR for speed — no worrying about multis for the math, it’s all done explicitly in num.

Then the resulting bits are combined in Perl 6, so that multi dispatch on the math operations gets you the correct type results. It took a bit of doing, but I just pushed the resulting code to github.

Now, this solution isn’t perfect. I’m pretty sure it’s possible to construct extreme cases where the step-by-step algorithm would have returned a Rat, but the new one returns a Num. And if you actually have 64-bit Ints available, you might lose a few digits of precision, because Num have less — but this is a problem throughout Rakudo, not just here.

But in general, this solution should be acceptably fast and come much closer to doing the right thing than the previous version did.

Euler #2

June 24, 2010

Just saw a nice post with solutions to Euler problem #2 in several different languages.

Each new term in the Fibonacci sequence is generated by adding the previous two terms. By starting with 1 and 2, the first 10 terms will be:

1, 2, 3, 5, 8, 13, 21, 34, 55, 89, …

Find the sum of all the even-valued terms in the sequence which do not exceed four million.

This one is so easy in Perl 6 it’s not worth writing a script, I just did it as a one-liner in the REPL:

> say [+] (1, 1, *+* ... 4000000).grep(* !% 2)

Breaking it down: 1, 1, *+* ... * is the standard way to generate the infinite Fibonacci sequence in p6. It creates a list of numbers using the ... “series” operator. The list starts with 1, 1, and each additional number is created by adding the two previous numbers (that’s the *+* bit). The final * means the series has no limit. By switching the limit to 4000000 instead, we explicitly say “stop the series as soon as the number 4000000 is exceeded.”

> say ~(1, 1, *+* ... 4000000)
1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578

Then grep filters on the test * !% 2. !% is the “divisible by” operator; * !% 2 generates a closure that checks its one input for divisibility by 2.

> say ~(1, 1, *+* ... 4000000).grep(* !% 2)
2 8 34 144 610 2584 10946 46368 196418 832040 3524578

Finally [+] is the reduce metaop for addition; it sums the list that follows.

I know that when people see one-liners, frequently they think the result has been golfed down. But this is really the natural way to express this problem in Perl 6.

Update: Very shortly after I wrote this, the “is divisible by” operator was changed to %%. At the moment, both versions of the operator work in Rakudo, but I would expect !% to go away eventually.

Trig Tests

June 23, 2010

When I originally wrote the expanded trig tests, the tests were very thorough. Given an angle, it would (say), test that angle in the default trig base, and then in radians, degrees, gradians, and “circles”. For three different ways to call it for Num, then for Rat, Complex, and Int. This led to a lot of trig tests: sin/asin had 1782 combined tests, for instance. But I still needed to add more tests! So I’ve known for a while I was going to have to reduce the number of tests. I just wasn’t sure how.

I played around with several different methods for reducing the count. I settled early on on the notion that the full tests would still be performed for the method forms of Num and Complex. That way we have confidence the underlying algorithms do work. My current effort is to get the code to automatically generate one test for each form for each type. So for instance

# Rat tests
is_approx((-3.9269908).Rat.sin, 0.7071067, "Rat.sin - -3.9269908 default");
is_approx((-0.625).Rat.sin(Circles), 0.7071067, "Rat.sin(Circles) - -0.625 default");
is_approx((-3.9269908).Rat.sin(:base(Radians)), 0.7071067, "Rat.sin(:base(Radians)) - -3.9269908 default");
is_approx(sin((-3.9269908).Rat), 0.7071067, "sin(Rat) - -3.9269908 default");
is_approx(sin((-225).Rat, Degrees), 0.7071067, "sin(Rat) - -225 default");
is_approx(sin(:x((-3.9269908).Rat)), 0.7071067, "sin(:x(Rat)) - -3.9269908 default");
is_approx(sin(:x((-250).Rat), :base(Gradians)), 0.7071067, "sin(:x(Rat), :base) - -250 default");

Except this is wrong: I want to test a different angle for each form. I need to do a bit of refactoring to make that possible, but I think I should be able to get it going today.

One trick I am using here might qualify as a new Perl 6 idiom. I have a list of angles to test, and a list of trig bases. I want to come up with lots of combinations. My solution has been to do something like this:

 my $base_list = (<Radians Degrees Gradians Circles> xx *).flat;

That gets you an infinite list that just repeats those four strings over and over again — you just .shift them off as needed. You could do the same thing with array indexing and modular arithmetic, but IMO this is more elegant. (Though now that I think about it, with the array I believe you could say something like @array[$count++ % *] and it would work.)

Thanks to The Perl Foundation and Ian Hague for supporting this work.

YAPC::NA Day 2

June 23, 2010

Another terrific day at YAPC::NA. First up were two smashing lectures from Damian Conway, which taught me some new Perl 5 and Perl 6 tricks, and made me feel great to be a Rakudo developer. Patrick Michaud and I were both startled several times at how much he did already worked in Rakudo — to the point where when he started talking about using Perl 6 macros, I started wondering if I had somehow missed someone implementing them!

One thing we were surprised didn’t work was using .trim on a Match object. Patrick guessed that maybe Match wasn’t Cool, but Larry quickly confirmed it was supposed to be. Then Patrick and Larry brought the issue up #perl6, and we got an example of how the Rakudo team works. While we were more or less oblivious (or I was, anyway), Moritz Lenz identified the source of the bug, Carl Masak submitted it to the bug tracker, and Jonathan Worthington brought up some related issues to worry about. 80 minutes later, Moritz had submitted a patch to fix the problem and new tests for the test suite. It was a great example of how the Rakudo team works — a great example we could then share with others for the rest of the day!

(Aside: And now that I think about it, this suggests that we need a bunch more tests for Match objects, to make sure that all the Cool Str-like functions work okay on them. I’ll bet there are at least a couple more holes there.)

Grabbed a quick sushi lunch (finally! — been complaining for a week that sushi is the one thing lacking in Midland, but circumstances have been such that I kept on passing up potential chances to have some), and then sat through the lunch hour in the Parrot BOF, quietly hacking on trig tests. Then Patrick’s Rakudo Star lecture, which was satisfying but not surprising. A couple more hours hacking on trig tests (this is quickly turning into an obsession, and I really want to have it done!), the Tuesday keynote, and the lightning talks. Piers Cawley sang another song, not quite as nice as the one the day before, but this one partially original and directly referencing things Perlish and not.

At dinner I sat down with Jerry Gay and Phillip Moore, mostly because they had been with the Parrot / Rakudo crowd the night before, though I’d not talked much to either. Turned out most of the rest of the table were Bank of America Perl programmers, and we had a fine dinner and headed out to the Varsity Club after. Seemed like a lot of the conference ended out there… Jerry pointed me toward a Perl programmer who has done Perl CAD work, which I’ll have to investigate if I ever get this stupid trig work done.

YAPC::NA Day 1

June 22, 2010

Had a terrific time at YAPC::NA today. Got to meet Larry Wall, Patrick Michaud, Will Coleda, chromatic (well, failed on properly introducing myself, but did talk with him a bit), Damian Conway (okay, I at least saw him once before, but I don’t think I got up the courage to say hi that day), and Randal Schwartz, among others. Keynote was nice, Karen Pauley’s TPF talk was interesting, Patrick’s NQP lecture was great, and the lightning talks were 50% fun and ended with a terrific song about pubs sung by Piers Cawley.

Then there was the “VIP Party”, and a planning meeting for Rakudo / Parrot.

And in my down time, I got about two-thirds of the way through my project to make the trig spectests smarter.

All in all, a fun, productive day. Can’t wait for tomorrow!

Edit: Fixed NPQ for NQP typo. Posted too quickly, too late last night. Also woke up realizing I was going about the modified trig tests in the wrong way, so there is a bit more work to do there than I thought last night.

Laziness in Perl 6: Much Easier

June 20, 2010

Just saw a nice blog post from Tinydot on using laziness in Perl 5 (as compared to Haskell). Of course, laziness is also a key feature of Perl 6, so I thought it would be fun to rewrite the code for Rakudo.

First, here’s the Haskell version from that post:

positives = [0..]
is_even x = x `mod` 2 == 0
evens = filter is_even

main = mapM_ print
[ ("10 positives", take 10 positives)
, ("10 evens" , take 10 (evens positives) )
, ("evens < 10" , takeWhile (< 10) (evens positives) )

Here’s my Perl 6 translation (works with current Rakudo):

sub is_even($x) { $x !% 2 };
sub evens(@x) { @x.grep(&is_even); }

sub take-while(Mu $test, $data) {
    gather {
        for $data.list {
            if $_ ~~ $test {
                take $_;
            } else {

say "10 positives ", ~(1..*).list.munch(10);
say "10 evens ", ~evens(1..*).munch(10);
say "evens < 10 ", ~take-while(* < 10, evens(1..*));

(Note that I took 1 to infinity as the positives — I think the Haskell takes 0 to infinity.)

I tried using @positives = 1..*, but apparently that usage isn’t lazy yet. Perl 6 doesn’t have a take-while method (so far as I know), so I had to write one — very straightforward. Note that this implementation is fully lazy. Other than those two things, and a little Haskell weirdness for output, this is basically a line-for-line translation of the Haskell version.

A word about the use of Mu in the take-while definition. That’s so that any kind of test can be passed, including junctions: you could say something like take-while(2|4|6|8, evens(1..*)). It is every bit as powerful as the built in grep method. (In fact, I actually copied grep’s source and modified it to get this sub.)

Update: Masak++ points out that there is some question whether @positives = 1..* is supposed to be lazy. It may be that the proper lazy way is @positives <== 1..*. This is an area of the Spec which is still in a bit of flux…

Accepting Equality

June 19, 2010

Recently I realized that Numeric and Real need to implement ACCEPTS. As far as I know, the Spec has nothing to say about what this ACCEPTS should do, so I looked at the current implementation. There are two versions of Num.ACCEPTS, one which takes anything and uses == to compare, and one which takes a Complex and checks that the imaginary part is zero, and the real part is equal to the Num. Both ACCEPTS have special cases for NaN, needed because NaN != NaN.

Talking about this with Jonathan, I was initially stumped. If every class had to have an ACCEPTS for its own type, for Real, and for Numeric, things would be a mess. Then it occurred to me: why not implement it by simply calling to ==? As long as == is doing the right thing, ACCEPTS will do the right things as well. (Errr, modulo the NaN issue, which might get tricky.)

Of course, that depended on a notion of “right thing” for ==. Quick check: does a Real number equal the Complex version of the same number?

> 1 == 1 + 0i

Huh. Well, let’s get a second opinion:

colomon: alpha: say 1 == 1 + 0i
p6eval: alpha 30e0ed: OUTPUT«1␤»

Whoops. The False answer in current Rakudo is because of my reworking of Numeric comparison. At the time, I was thinking of sorting, and figured it would do no harm if 1 < 1+0i. But of course that looks silly in the context of ==! Apparently the test suite doesn’t have any tests for this case.

As frequently happens with these posts, I started this one without any clear sense of where I should go, but now I’ve got a notion:

1) Fix == so that Reals and equivalent Complexes are equal again.

2) Make a generic ACCEPTS for Numeric that works based on ==.

3) Somehow fold NaN handling into the mix.

Hmmm. Okay, my actual first step is going to be to completely muck up Numeric.ACCEPTS (intentionally), and see what, if any, tests fail, so I can find out the state of testing ACCEPTS.

Thanks to the Perl Foundation and Ian Hague for supporting this work.

Series Again

June 18, 2010

While I meant to be finishing up the numerics grant, I got sidetracked and worked on pmichaud’s list branch. Then just after that merged for release, I found myself looking into series again. We had some new rules on how to handle series which went between two single-character strings. We also had a patch for series that was supposed to help with handling strings.

Unfortunately, the patch relied on defining a new comparison operator which had a special case for strings. I knew that was likely a mistake, because I’d tried the same approach months before with Range and been roundly lambasted for it. That time had led to Spec changes to try to make it clear that wasn’t the right approach.

The specific failing test that prompted the change was 'X' ... 'Z'. Looking at that, two things popped out at me. First, obviously it should be handled by the single-character endpoints special case. Second, it was actually running in the series multi which took two scalars. That multi was a fossil; it was the simplest case we could handle earlier on, and had never gotten removed when the much smarter and more capable series multi which took an array for the left-hand side was written later.

So my first step fixing things up was to remove the “two scalars” version of the series operator. I had to define a new two scalars version that simply forwarded to the general series operator code. But I was still getting failing tests. 4 ... ^5 was supposed to be 4, 3, 2, 1, 0, 1, 2, 3, 4, but it was coming back 4, 3, 2, 1, 0 again like it used to. What was up with that?

Well, it turned out that case — if there is an array on the right-hand side of the operator, every member but the first is added to the end of the series — was only fixed for the “two scalars” case. (Yes, despite having a scalar variable for the right-hand side, if there is no array multi which matches, an array on the right-hand side is passed as an Array object.) There were no tests for this other than the simplest case, so there were no tests to show that the problem wasn’t fixed in most cases.

So I had to move that fix from the fossil multi into the active one… except that caused other problems. I ended up writing two little helper multis which handled the array on the right-hand side by dispatching to the main multi and then adding on the extra elements:

our multi sub infix:<...>($lhs, @rhs is copy) {
    fail "Need something on RHS" if !@rhs;
    ($lhs ... @rhs.shift), @rhs

That may be less than ideal as a solution, but it is clear and simple and at least in theory it is lazy. (Hmm… should probably add a test case where ($lhs ... @rhs.shift) is actually infinite…)

Once I had everything working again without the fossil multi, and some new tests as well, I turned to the single-character endpoints issue. As TimToady had thoughtfully provided the source to generate the next element in such a series right in the Spec, it was a simple matter of adding those rules to the code which makes a generator Block for the series:

        if $lhs ~~ Str && $rhs ~~ Str && $lhs.chars == 1 && $rhs.chars == 1 {
            if $lhs cmp $rhs != 1 {
                -> $x { $x.ord.succ.chr };
            } else {
                -> $x { $x.ord.pred.chr };

At this point, I started adding tests for more complicated alphabetical series, namely 'A' ... 'ZZ', 'AA' ... 'Z', 'Z' ... 'AA', and 'ZZ' ... 'A'. All but the last worked right from the get-go. Turns out the problems people had been having were because the fossil multi that was handling these cases didn’t actually handle end-of-series conditions correctly.

The last was an interesting case. The naive view of it was that it should count down: ZZ, ZY, ZX ... AB, AA, Z, Y ... A. Things don’t actually work that way, nor are they intended to, because 'AA'.pred returns Failure rather than 'Z'. This led to a lot of IRC discussion, with the upshot that if the generator Block returns a Failure, the series ends. (For now; we will see if this approach causes problems.)

I wish I could say the series operator was done after all of that, but there are still several areas for improvement:

1. If the series is constructed from integers using a geometric ratio, the resulting numbers are Rats, even if they are all actually integer values:

> say (1, 3, 9 ... 100).perl
(1, 3, 9, 27/1, 81/1)

2. We haven’t even attempted to implement the ...^ series operator yet, which excludes the right-hand side end point from the series. Hmm… wonder if it could be implemented simply as

our multi sub infix:<...^>($lhs, $rhs) { 
    ($lhs ... $rhs).grep({ $rhs cmp $_ != 0});

Okay, that doesn’t handle the $rhs is an Array case, but it seems close to being right otherwise.

3. Errr… well, I’d thought of a third case when I started this post, but I’ve forgotten it now.

div and mod

June 7, 2010

After consideration and looking at what Rakudo actually does, I decided the right solution to the div and mod issue was to change them in the Spec. In particular, I see them as having one interesting use: with integer types, they can always return the same integer type as their result, because either’s result will never have greater magnitude than its operands. I’ve rewritten their definitions to correspondingly tighten them as only working on integer types.