Series Again

While I meant to be finishing up the numerics grant, I got sidetracked and worked on pmichaud’s list branch. Then just after that merged for release, I found myself looking into series again. We had some new rules on how to handle series which went between two single-character strings. We also had a patch for series that was supposed to help with handling strings.

Unfortunately, the patch relied on defining a new comparison operator which had a special case for strings. I knew that was likely a mistake, because I’d tried the same approach months before with Range and been roundly lambasted for it. That time had led to Spec changes to try to make it clear that wasn’t the right approach.

The specific failing test that prompted the change was 'X' ... 'Z'. Looking at that, two things popped out at me. First, obviously it should be handled by the single-character endpoints special case. Second, it was actually running in the series multi which took two scalars. That multi was a fossil; it was the simplest case we could handle earlier on, and had never gotten removed when the much smarter and more capable series multi which took an array for the left-hand side was written later.

So my first step fixing things up was to remove the “two scalars” version of the series operator. I had to define a new two scalars version that simply forwarded to the general series operator code. But I was still getting failing tests. 4 ... ^5 was supposed to be 4, 3, 2, 1, 0, 1, 2, 3, 4, but it was coming back 4, 3, 2, 1, 0 again like it used to. What was up with that?

Well, it turned out that case — if there is an array on the right-hand side of the operator, every member but the first is added to the end of the series — was only fixed for the “two scalars” case. (Yes, despite having a scalar variable for the right-hand side, if there is no array multi which matches, an array on the right-hand side is passed as an Array object.) There were no tests for this other than the simplest case, so there were no tests to show that the problem wasn’t fixed in most cases.

So I had to move that fix from the fossil multi into the active one… except that caused other problems. I ended up writing two little helper multis which handled the array on the right-hand side by dispatching to the main multi and then adding on the extra elements:

our multi sub infix:<...>($lhs, @rhs is copy) {
    fail "Need something on RHS" if !@rhs;
    ($lhs ... @rhs.shift), @rhs
}

That may be less than ideal as a solution, but it is clear and simple and at least in theory it is lazy. (Hmm… should probably add a test case where ($lhs ... @rhs.shift) is actually infinite…)

Once I had everything working again without the fossil multi, and some new tests as well, I turned to the single-character endpoints issue. As TimToady had thoughtfully provided the source to generate the next element in such a series right in the Spec, it was a simple matter of adding those rules to the code which makes a generator Block for the series:

        if $lhs ~~ Str && $rhs ~~ Str && $lhs.chars == 1 && $rhs.chars == 1 {
            if $lhs cmp $rhs != 1 {
                -> $x { $x.ord.succ.chr };
            } else {
                -> $x { $x.ord.pred.chr };
            }
        }

At this point, I started adding tests for more complicated alphabetical series, namely 'A' ... 'ZZ', 'AA' ... 'Z', 'Z' ... 'AA', and 'ZZ' ... 'A'. All but the last worked right from the get-go. Turns out the problems people had been having were because the fossil multi that was handling these cases didn’t actually handle end-of-series conditions correctly.

The last was an interesting case. The naive view of it was that it should count down: ZZ, ZY, ZX ... AB, AA, Z, Y ... A. Things don’t actually work that way, nor are they intended to, because 'AA'.pred returns Failure rather than 'Z'. This led to a lot of IRC discussion, with the upshot that if the generator Block returns a Failure, the series ends. (For now; we will see if this approach causes problems.)

I wish I could say the series operator was done after all of that, but there are still several areas for improvement:

1. If the series is constructed from integers using a geometric ratio, the resulting numbers are Rats, even if they are all actually integer values:

> say (1, 3, 9 ... 100).perl
(1, 3, 9, 27/1, 81/1)

2. We haven’t even attempted to implement the ...^ series operator yet, which excludes the right-hand side end point from the series. Hmm… wonder if it could be implemented simply as

our multi sub infix:<...^>($lhs, $rhs) { 
    ($lhs ... $rhs).grep({ $rhs cmp $_ != 0});
}

Okay, that doesn’t handle the $rhs is an Array case, but it seems close to being right otherwise.

3. Errr… well, I’d thought of a third case when I started this post, but I’ve forgotten it now.

About these ads

Tags:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s


Follow

Get every new post delivered to your Inbox.

%d bloggers like this: