## First Sheet Music!

September 19, 2010

There were a number of comments on #perl6 on how to improve the code from my last post, but I’m ignoring them all today and forging ahead to get an actual ABC to Lilypond translator up and running. Here’s the core of the code:

sub StemToLilypond(Context $context,$stem) {
my $match = ABC::Grammar.parse($stem, :rule<mnote>); # bad in at least two ways....
my $pitch = ~$match<pitch>;
my $length = ~$match<note_length>;

print " { %note-map{$pitch} }{ %cheat-length-map{$length} } ";
}

sub BodyToLilypond(Context $context, @elements) { say "\{"; for @elements ->$element {
given $element.key { when "stem" { StemToLilypond($context, $element.value); } when "barline" { say " |"; } } } say "\}"; } This code is wrong for a host of reasons. But the cool thing is, for simple tunes, it very nearly works. Basically, we go through the list of musical elements in the tune. When we see a barline, we output a bar line to Lilypond. When we see a stem, we parse it again (ugh) assuming it is the simple form a a stem, then map the pitch portion to the equivalent Lilypond note (ignoring key signature and accidentals), and the rhythm portion to the equivalent Lilypond duration (assuming that the ABC is notated in 8ths, and ignoring almost all potential complexity). This crude approximation almost suffices to properly notate “The Star of Rakudo”! Here’s the PDF Lilypond outputs when feed the output of this script. At first glance this looks pretty good, but actually it’s got one major issue (as well as a host of minor ones). I think it might be obvious even to those who don’t read sheet music if it is compared to this proper PDF: the notes are all correct, but they are shifted one eighth note off in the measures! This is because Lilypond apparently treats bar lines in the source code as a mere debugging aid — if they don’t agree with what it thinks they should be, it will issue a warning and then go with what it thinks rather than what you said. I guess that behavior might make sense in big orchestral scores, but it is just annoying at this level. So, what needs to be done to make this tune work: 1) Detect pick up bars and send the \partial command to Lilypond so that it keeps the bar lines in the correct places. 2) Preprocess the music looking for repeats, so that the repeated section can be properly marked for Lilypond. 3) Handle the key signature correctly. 4) Handle the ~ gracing, which in should generate a turn symbol over the note. (Used here to indicate an Irish-style roll.) It appears to me that these are straightforward programming problems, so I’m going to forge ahead and see what I can do… ## Regretting My Actions September 18, 2010 I liked the array of pairs structure for the header so much I decided to do the same sort of thing for the music part itself. Here’s the bit of the grammar this post talks about: regex element { <broken_rhythm> | <stem> | <rest> | <gracing> | <grace_notes> | <nth_repeat> | <end_nth_repeat> | <spacing> } regex barline { ':|:' | '|:' | '|' | ':|' | '::' } regex bar { <element>+ <barline>? } regex line_of_music { <barline>? <bar>+ } regex music { [<line_of_music> \s*\v?]+ } As a crude first approximation, I transformed each element into a Pair with the name of the rule that made the element as the key, and the actual string parsed as the value. Likewise each barline becomes a Pair with the key “barline” and the string for the value. In the long run, I’m going to have an ABC::Element class (or something like one), but this version ought to be enough to get simple ABC translation up and running. That all worked fine right from the start. But the next bit was trouble: method bar($/) {
my @bar = @( $<element> )>>.ast; @bar.push($<barline>>>.ast);
make @bar;
}

method line_of_music($/) { my @line =$<barline>>>.ast;
my @bars = @( $<bar> )>>.ast; for @bars ->$bar {
for $bar.list { @line.push($_);
}
}
@line.push("endline" => "");
make @line;
}

method music($/) { my @music; for @($<line_of_music> )>>.ast -> $line { for$line.list {
@music.push($_); } } make @music; } That’s what I ended up with, and it works. But when I started, each of those action methods was a simple one-liner. Alas, that version built up a structure in the results, when I wanted to end up with a nice, flat list of elements. I spent an hour trying different variations to remove that structure, finally ending up with the inelegant mess here. I’m hoping someone out there has a better way to do it. At this point, this project is really close to being able to do useful things. Given a sufficiently simple tune (and many in ABC are very simple indeed!) this framework should make it easy to translate to Lilypond. I don’t know if I’ll get the chance to code it today, but if not, tomorrow for sure, I think…. ## First Actions September 16, 2010 I renamed the old ABC class ABC::Grammar. Then I created a simple ABC::Header class to represent the header of an ABC file. At its heart, an ABC::Header object is just an Array of Pairs, each of which consists of a header field name (like “T” for title) and a header field value. (Why an Array instead of a Hash? Because certain header fields can appear more than once, and order is important.) Then I constructed a very simple actions set to test my understanding. For the two rules regex header_field { ^^ <header_field_name> ':' \s* <header_field_data>$$} regex header { [<header_field> \v]+ } I constructed the following actions: class ABC::Actions { method header_field($/) {
make ~$<header_field_name> => ~$<header_field_data>;
}

method header($/) { my$header = ABC::Header.new;
for @( $<header_field> ) ->$field {
$header.add-line($field.ast.key, $field.ast.value); } make$header;
}
}

So when the parser constructs a header_field, it makes a Pair with the header field’s name as the key and its data as the value. The Pair is stored in the .ast for the header_field. Then when the parser makes a header, it constructs an ABC::Header object and fills it with the Pairs from the various header fields. Whee!

This is passing my initial tests, so I’m going to move on to some more complicated actions next…

## Back to ABCs

September 16, 2010

I’ve been itching to get back to work on the ABC module (originally by SF from lastofthecarelessmen.blogspot.com, but my fork is the only version seeing current development work). A reference in an HN comment to Lilypond has provided the perfect. Lilypond is a long-lived open source project for music notation with very ambitious goals. In the past I’d tinkered with it a bit, but always come away unsatisfied. Today, though, I had a notion for an interesting project, so I thought I’d give it a whirl.

So, here’s a PDF of my tune “The Star of Rakudo” done with my usual slightly hacked version of jcabs2ps. And here’s a quick stab at a Lilypond version (with completely unnecessary 1st and 2nd endings added just so I can see what they look like). The Lilypond version has a heaviness to it I’m not wild about, but I think it’s pretty arguable that the various elements each look better than the other version. In addition, Lilypond appears to have good support for a more complicated musical elements. My inability to get jcabs2ps to portably generate all the musical elements I need has been a longstanding stumbling block to a side project of mine, so Lilypond is intriguing.

I don’t particularly like Lilypond’s music “language”, but that’s okay. My idea is to have the Perl 6 ABC code automatically translate ABCs to Lilypond for me, in effect generating good-looking notation from them. It will then be the best of both worlds — the lightweight, easy-to-type ABC format generating professional-looking output from Lilypond.

So, now I need to figure out how add actions to a Perl 6 grammar…

## Series, Memoization, and Limits

September 3, 2010

Out on Reddit, my last post got some interesting comments. In particular, one commenter creates a finite series of squares, and then asks “That’s all well and good, but what would I have to do to say ‘up to 900′ instead of saying ‘up to 30^2′?” That’s a great question, and I’d like to explore the answer to it, though I’m going to stick with the Fibonacci series for my examples.

First, there’s a very quick and dirty answer to this. If you modify the original series, it is easy to add a limit term:

> my @Fibonacci := (0, 1, -> $a,$b { $a +$b } ... 900);
0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610

That gives you all the Fibonacci numbers less than or equal to 900. But the list is no longer expandable; with this one, if you say @Fibonacci[100], rather than calculating the 100th value of the series, it will just act like it’s a normal out-of-bounds array access.

In an ideal world, we’d have an easy way of looking at just part of an infinite series. What tools do we have in our toolbox? Well, there’s grep:

>  my @Fibonacci := (0, 1, -> $a,$b { $a +$b } ... *); 1;
1
> my @F-low := @Fibonacci.grep(* < 900); 1;
1
> say @F-low[10]
55
> say @F-low[20]

…and that last one is where the trouble starts, because it never comes back. By 20, we’re already well over 900. But grep’s been fed an infinite list, and it doesn’t know anything about it, so it just keeps on looking forever. So that won’t do.

The next tool we have is the first method.

> my @Fibonacci := (0, 1, -> $a,$b { $a +$b } ... *); 1;
1
> say @Fibonacci.first(* <= 900)
0
> say @Fibonacci.first(* > 900)
987

As you can see, first is great at finding the first Fibonacci number greater than 900. It’s pretty useless for finding all the numbers less then 900, though.

So, the bad news here is that, as I write this, Perl 6 does not have a method which can give us a finite portion of an infinite list based on some test, other than “give us the first N elements”. The good news is that the function we want is dead simple to write:

sub take-while(@a, Mu $test) { gather { for @a.list { last unless$_ ~~ $test; take$_;
}
}
}

This is actually a straightforward modification of the normal grep code. $test is defined as Mu so it can be anything that can be smartmatched against, including junctions. Then we scan the list, exiting if a smartmatch against the current element fails, and otherwise returning the element in our output list. This is just the trick for the problem in question: > my @Fibonacci := (0, 1, ->$a, $b {$a + $b } ... *); 1; 1 > take-while(@Fibonacci, * < 900) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 > take-while(@Fibonacci, * < 400000) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 > take-while(@Fibonacci.grep(* %% 2), * < 400000) 0 2 8 34 144 610 2584 10946 46368 196418 > take-while(@Fibonacci, Int) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 2584 4181 6765 10946 17711 28657 46368 75025 121393 196418 317811 514229 832040 1346269 2178309 3524578 5702887 9227465 14930352 24157817 39088169 63245986 102334155 165580141 267914296 433494437 701408733 1134903170 1836311903 So here we have it: we’ve still got the efficiency of the de facto memoized solution, but we can also carve off portions of the list based on some criteria. I threw in the last one there just to show it didn’t have to be a Code block passed in; in this case, it shows us how far we get down the series with Int values before our current limited integer math throws in the towel and gives you back a Num instead. (Note that once we get arbitrarily large Ints in Rakudo, that line will no longer work, because take-while(@Fibonacci, Int) will return an infinite list!) And now for the even slightly better news: as of today, take-while is available in the List::Utils module. ## The Series Operator and Memoization August 28, 2010 Masak recently posted on the series operator solution to Pascal’s Triangle. I’d like to focus on an important property of his final solution that hasn’t been much discussed yet. But first, let me back up and consider another well-known problem with similar properties: the Fibonacci series. A straightforward recursive solution looks like this in Perl 6: sub Fibonacci($n) {
return $n if$n == 0 | 1;
Fibonacci($n - 1) + Fibonacci($n - 2);
}

This nicely implements the definition, but it has one critical problem — it is incredibly inefficient. (And not just in Rakudo — in any language!) The reason is that calculating Fibonacci(N) requires all the work of calculating Fibonacci(N-1) and Fibonacci(N-2). The number of subroutine calls involved grows as fast as the series does.

The standard recursive technique for dealing with this problem is called memoization. Basically, you cache the values which have already been computed somewhere, so you only need to compute them once. That’s a bit awkward to implement by hand — it roughly doubles the complexity of the routine, I’d say. Luckily, lots of languages have automatic ways to apply memoization. Perl 6 is supposed to, with the “is cached” modifier to a subroutine definition, but Rakudo doesn’t have it implemented yet.

That’s okay, because there is more than one way to do it, and there is a great way which is available already in Rakudo.

> my @Fibonacci := (0, 1, -> $a,$b { $a +$b } ... *); 1;
1
> say @Fibonacci[5]
5
> say @Fibonacci[10]
55
> say @Fibonacci[30]
832040

(I’ve used parentheses instead of do because that’s how I roll, but otherwise this is exactly the same idea as Masak’s solution. The additional 1; is needed only because I typed this into the REPL, and without it the REPL will try to evaluate the entire infinite series so it can print it out.)

You can’t see it here, of course, but calculating these numbers was very quick. That’s because this solution using the series operator is equivalent to writing a very smart memoization. Infinite lists in Perl 6 are implemented (roughly speaking) with an array of values that have been already calculated, and an iterator which knows how to calculate further values as needed.

So when you say @Fibonacci[5] the first time, the list calls the iterator generated by the series operator to get the first six values, storing them in @Fibonacci[0] through @Fibonacci[5]. (Those values are generated in a straightforward iterative way, so it is very efficient.) When you then say @Fibonacci[10] those first six values are still there, and only the next five values must be calculated. If at that point you said @Fibonacci[8], that is just a simple array lookup of the already calculated value!

Think about it. It would take a very smart automatic memoizer to somehow figure out that the function would only be called when $n was a non-negative integer, so that an array could be used to efficiently cache the results. Using a series operator this way gets you that kind of performance automatically already in Rakudo. So it’s a double win. Using the series operator is not only the most succinct way to express this series in Perl 6. It’s also an extremely efficient way of calculating the series. ## Quick ABC fixes August 26, 2010 SF’s ABC project continues to be dead in the water, but that hasn’t stopped me from needing some quick ABC scripts. I’m preparing a mini-book of tunes for an Irish Traditional Dance Music workshop, and needed a couple quick programming tricks to make my life easier. First, a file of ABC tunes should have a unique “X:” index field for each tune. I’d gathered these tunes from around the Internet; correspondingly, about half had “X: 1″, and the rest had some random number. This was easily fixed with a simple Perl 6 script: my$x = 1;
for $*IN.lines { when /^X\:/ { say "X: {$x++ }"; }
say $_; } The only thing at all odd here is that I’ve used when in a for loop. It just smartmatches the topic ($_) against the regex there, and if it’s a match, executes the code that follows and then skips to the next iteration of the loop.

The trickier thing I needed to do was transpose a couple of tunes, because the versions I found on the Internet were in the wrong key. Cutely, this is pretty easily done with the .trans method in Perl 6.

for $*IN.lines ->$line {
# say (~$line).trans("C..GABc..gab" => "FGABc..gab"); # from D to G say (~$line).trans("C..GABc..gab" => "D..GABc..gabh"); # from G to A
}

This needs to be applied only to the notes of the tune, and if you use the “G to A” transposition and you have an “h” as a note, you need to switch it to “c’”. (You also need to change the declared key signature in the tune’s header. And accidentals might need work. This is definitely crude ABC-handling code, but it suited my purposes perfectly last night.)

There is one big gotcha here. Why is it (~$line) instead of just$line? Well, internally .lines is calling .chomp. And apparently .chomp no longer returns a Rakudo Str, which seems to mean some other .trans is executed, causing no end of problems.

Anyway, at some point I intend to try to get the real ABC module working in Rakudo (if no one else beats me to it). In the meantime, I’ve got to survive a weekend’s worth of folk festival first.

## Timing Text File Reads, Part 3

August 21, 2010

So, the last two posts suggest there is major overhead to using the lazy iterator approach with .lines. I decided to explore this by rolling my own iterator to read a file. First, I suspected the gather / take has a big overhead, so I just tried for a basic customer iterator first:

class LinesIter is Iterator {
has $!filehandle; has$!value;

method infinite() { False } # or should this be True?

method reify() {
unless $!value.defined {$!value := pir::new('Parcel');
my $line =$!filehandle.get;
if $line.defined { pir::push($!value, $line); pir::push($!value, LinesIter.new(:filehandle($!filehandle))); } }$!value;
}
}

That takes 21.7, very slightly more than the standard gather / take version. So much for the theory gather / take is inefficient here!

I’m told that the spec requires that .lines be strictly lazy so you can mix in calls to .get. I don’t know where, and it seems a bit crazy to me. But anyway, by those lights the following potential optimizations are actually illegal, because they break the strict connection between .lines and .get.

Here’s one that does three .gets at a time, cutting the number of iterator objects created by two-thirds.

class LinesIter is Iterator {
has $!filehandle; has$!value;

method infinite() { False }

method reify() {
unless $!value.defined {$!value := pir::new('Parcel');
my $line =$!filehandle.get;
if $line.defined { pir::push($!value, $line);$line = $!filehandle.get; if$line.defined {
pir::push($!value,$line);
$line =$!filehandle.get;
if $line.defined { pir::push($!value, $line); pir::push($!value, LinesIter.new(:filehandle($!filehandle))); } } } }$!value;
}
}

Clocking in at 15.6 seconds — significantly better than the current .lines implementation, significantly worse than the .get version — this was actually the best variant I came up with.

I tried upping the count to 8, but it actually ran a touch slower then. And jnthn suggested a version which tried to optimize creation of the iterator objects, but it actually ran significantly slower than the naive version.

I’m not really sure where this leaves us…

## Timing Text File Reads, Part 2

August 21, 2010

Overnight pmichaud rewrote .chomp in PIR to optimize it, leading to huge improvements in performance. The .lines version of the code has gone from 55s to 21.5s; the .get version from 44s to 7.2s. Weirdly enough, the .slurp version has gone from 5.6s to 9s. (No idea why.)

The good news is that the performance of the .get version is clearly much, much better. (Still with plenty of room for improvement!) The bad news is that the penalty for using the iterator-based version is now much more significant — it adds nearly 200% to the read time!

But then, that’s another potential spot for optimization…

August 21, 2010

I’ve been worried about the performance of .lines reading text files due to iterator overhead, so I decided to time it tonight. It is a problem, but it appears to not be the biggest problem!

As usual, I probably did things backwards. I started off by creating a ten thousand line text file from Violet Jacob poems and Rakudo’s core.pm. Then I ran it against this:

my $count = 0; for$*IN.lines {
$count++; } say :$count.perl;

That takes 55s on my MacBook Pro.

The .lines method is just an iterator wrapper around .get, so I tried a version which calls .get directly next.

my $count = 0; loop {$*IN.get // last;
$count++; } say :$count.perl;

That takes 44s on the MBP. So using the iterator adds 25% to the execution time. That’s bad, but in a certain sense, less bad than the straight .get version being so incredibly slow.

Next attempt, .slurp. What’s the overhead of doing things line-by-line, with an autochomp?

\$*IN.slurp;

That takes 5.6 seconds. So the line-by-line overhead is terrible AND even the slurp version is crazy slow.

jnthn coded up two PIR versions for comparison. Version 1 reads the entire file in PIR — that took 0.84 seconds on my MBP. Version 2 reads the file line-by-line in PIR, and is drastically faster — 0.03 seconds. (pmichaud++ reported this discrepancy to the Parrot team.)

It looks like .chomp might be the best point of attack. It looks grotesquely inefficient at the moment. But it’s time for bed now….

Update: see the next post for reports on how pmichaud’s .chomp optimization (mentioned in the comments) improved the situation.