## Philosophical Issues with Rakudo’s .parse

March 6, 2013

For a little bit now I’ve had the ABC module working on both Niecza and Rakudo. I got a new Linux box yesterday; this morning I was playing around with my ABC tools there, trying to see if I could get them to work. I installed them with panda, so I was using ABC with Niecza by default on my MBP and with Rakudo by default on my new machine. I tried it with a file of my own compositions. Apparently I hadn’t tried this in a while, because it had several issues. And the two systems handled them completely differently.

On Niecza, I got this message:

Unhandled exception: Did not match ABC grammar: last tune understood:
A Hat for the Whales
d|:e~A3 BGEF|G2BG DG (3ABd|eBdB ~A3G|Bdef gfed|
e~A3 BGEF|G2BG DG (3ABd|eBdB AAGA|[1Beed e3d:|


On Rakudo, there were no error messages, but “A Hat for the Whales” was the last tune processed, even though there were a dozen more tunes in the ABC file after it.

Without thinking about it too much I corrected this batch of errors. Now on Niecza the entire file parsed, but I got the error

Unhandled exception: Illegal key signature B minor

Under Rakudo, the same ABC file just appeared to work.

What’s going on here? Well, key signature processing looks something like this (simplified):

    token key-def { <basenote> <chord_accidental>? <mode>? }
token mode { <minor> | <major> }
token minor { "m" ["in" ["o" ["r"]?]?]? }
token major { "maj" ["o" ["r"]?]? }


No space allowed between basenote and mode. So when you invoke parse with that rule on “B minor”, in Niecza it fails, because the entire string was not successfully parsed. In Rakudo it succeeds, because “B” is a valid key signature, and it doesn’t care that the entire string was not parsed. In practice, this means that three of the tunes in the collection got the wrong key signature: B major instead of the correct B minor.

To my mind, Niecza’s behavior is more intuitive. If you’re trying to parse a file, parsing half of it and then silently ignoring the rest is wildly unhelpful behavior. In every use I’ve had for parse so far, I’ve wanted it to parse everything in the string I sent to the function.

But hey, I understand that some people might have different needs. Can we at least add a flag that will give you a parsing failure if the entire string is not parsed? :all, perhaps?

## STEP Actions

October 16, 2012

I’ve started adding actions to the STEP code, targeted (at least at first) at collecting up the entity instance names in a parameter list to make it easier to analyze what’s in a STEP file.

I need to do this more often, because I spent about an hour fumbling about making stupid mistakes. And even now that I’ve got it working, I’m haunted by the notion there should be a simpler approach. Anyway, here’s what I’ve got at the moment:

class ISO_10303_21::Actions {
method entity_instance_name($/) { make [~$/] }
method parameter($/) { for <typed_parameter untyped_parameter omitted_parameter> ->$s {
return make $/{$s}.ast if $/{$s}.defined;
}
}
method omitted_parameter($/) { make [] } method untyped_parameter($/) {
for <entity_instance_name list_of_parameters> -> $s { return make$/{$s}.ast if$/{$s}.defined; } make []; } method typed_parameter($/) { make $<parameter>.ast } method list_of_parameters($/) { make merge-arrays(@($<parameter>)».ast); } method parameter_list($/) { make merge-arrays(@($<parameter>)».ast); } }  What’s bugging me are the parameter and untyped_parameter definitions — those seem like they should be easier to express somehow. At least, that basic functionality is going to be pretty common. Oh, and I just found (and fixed) a bug in the code looking at this blog post. Feel like I had something more to say, but I’m tired and heading to bed. ## Parsing STEP October 10, 2012 A month ago I started perl6-ISO_10303-21. In plain English, it’s the beginnings of a toolset for working with CAD files in the STEP format. All it does is parse so far. The STEP (that’s ISO-10303) Part 21 standard provides a BNF (I think) grammar for ASCII STEP files. It was downright trivial to translate it to a Perl 6 grammar — looking at the commits, it took me about two days to get it to read the first file. THen it sat there for a long time until I starting throwing files at it. Since then I’ve made a couple of simple changes. First, I replace C-style comments in the file with spaces before parsing; the spec’s grammar doesn’t mention them at all. Second, I discovered three of the basic sample files I was using were actually illegal, with newlines embedded in strings. I fretted about what to do for a while: should I expand the grammar to accept illegal but “standard” files, or keep the grammar strict so that I could use it to test files my own code exports for correctness? Then I realized I could have my cake and eat it too: grammar ISO_10303_21::LooseGrammar is ISO_10303_21::Grammar { token non_q_char { <special> | <digit> | <space> | <lower> | <upper> | \v } }  I just defined a new subclass, changing the non_q_char token to also accept vertical whitespaces, too. Now ISO_10303_21::Grammar is strict, and ISO_10303_21::LooseGrammar is loose. The next obstacle I’ve run into was speed. The 1.3 meg sample file took about five minutes to prase in Rakudo, and 10+ hours (!) in Niecza. Luckily jnthn++ was on the job, and sent me a patch about fifteen minutes after I sent him the profile of the parse today. (Rakudo’s got --profile built in.) That shaved off 1/6th of the running time. He also pointed out that tokens like token standard_keyword { <upper> [ <upper> | <digit> ]* }  were unnecessarily creating a Match object for each character in the keyword. Since all we want is the full text of the keyword, this could be changed to token standard_keyword { <.upper> [ <.upper> | <.digit> ]* }  and it would suppress those Match objects. That reduced the running time by another 10%. Another few days like this and it will be fast enough to do some useful work for me… ## Sums of Fourth Powers October 5, 2012 Another fun John Cook post has sent me off into playing with numbers and Perl 6. He gives a footnote saying that Euler discovered 635318657 = 158^4 + 59^4 = 134^4 + 133^4 and that this was the smallest number known to be the sum of two fourth powers in two ways. It seems odd now to think of such questions being unresolved. Today we’d ask Hardy “What do you mean 635318657 is the smallest known example? Why didn’t you write a little program to find out whether it really is the smallest?” Perl 6 has features that make that fun to explore, so let’s go to it! Here was my first attempt: my @fourth-powers = (1..*).map(* ** 4) ...^ * > 635318657; my %counts; for @fourth-powers X+ @fourth-powers ->$sum {
%counts{$sum}++; } for %counts.keys.grep({ %counts{$_} > 2 }) -> $sum { say "$sum has { %counts{$sum} / 2 } fourth power sums"; }  This works, and is quite fast in Niecza. (It’s much slower in Rakudo for some reason, though it still finishes in about ten seconds.) I love that first line, which constructs a lazy infinite list of fourth powers, then truncates it at 635318657. Then it uses X+ to build all the sums of two fourth powers less than (or equal to) 635318657. The rest of the code is pretty mundane, I fear. I decided to try to use classify to get rid of the explicit loops, resulting in this code: my @fourth-powers = (1..*).map(* ** 4) ...^ * > 635318657; my @results = (@fourth-powers X+ @fourth-powers).classify({$_ }).pairs.grep(*.value > 2);
say @results.map({ $_.key ~ " " ~$_.value.Int / 2 ~ " times" });


Unfortunately, this doesn’t work in Niecza. So now I’ve got some bugs to fix…

July 2, 2012

So, I’ve hit on something extremely annoying working with Niecza on my MacBook Pro. I upgraded to Mono 2.10.9 a few months back. Since then, Glib# has not worked. Even after I uninstalled Mono 2.10.9. It is incredibly frustrating.

Here’s the error I’m getting:

Unhandled exception: System.TypeInitializationException: An exception was thrown by the type initializer for GLib.GType ---> System.DllNotFoundException: glibsharpglue-2


Here’s the call:

constant $GLIB = "glib-sharp, Version=2.12.0.0, Culture=neutral, PublicKeyToken=35e10195dab3c99f"; constant$G_TYPE_STRING = CLR::("GLib.GType,$GLIB").String;  Here’s what gacutil says about glib-sharp: glib-sharp, Version=2.12.0.0, Culture=neutral, PublicKeyToken=35e10195dab3c99f  If anyone has any suggestions on how to fix this, I’m all ears. ## Handling Errors in ABC May 24, 2012 When I started thinking I should write yesterday’s post and started testing the Ed Reavy ABC file, I quickly ran into one of the module’s limitations that I thought I should fix before claiming ABC was a useful program. When I started trying the Reavy file, if the ABC file ran into an error in the ABC file, it would frequently just hang. Now, it was relatively easy to track down the erroneous ABC at that point, because the ABC Actions class prints all the title fields it finds as it finds them. You’d watch the list of tune names going by, and if it stopped the list before the tune file was done and just sat there, you knew something was wrong with that tune. That’s okay for a toy program that only I play with, but it’s unacceptable for a serious program other people can use. I had a theory about what caused it: when the grammar hit something it didn’t understand, it started backtracking to try to find a correct way of interpreting the file. Since there was no correct way, that resulted in a painfully long attempt to repeatedly re-interpret the bad data. So I asked how to stop backtracking grammar regexes on #perl6. They helpfully suggested changing the ABC grammar’s regex methods to token, because token suppresses backtracking. I tried just changing one method, and that didn’t help. I tried changing them all, and that failed some tests. I changed one method back, and suddenly all the tests passed, and the grammar no longer hung on ABC file errors! That was a huge improvement, but it was still only marginally helpful at figuring out what the problem in ABC file was. Once I was done with yesterday’s post, I tried to figure out how to get better information about where the parsing problem was. I hit upon a simple method. I added a $.current-tune attribute to the ABC::Actions class. When the action for a tune title fires, it clears this attribute and sets it to the tune title:

    method header_field($/) { if$<header_field_name> eq "T" {
$*ERR.say: "Parsing " ~$<header_field_data>;
$.current-tune =$<header_field_data> ~ "\n";
}

make ~$<header_field_name> => ~$<header_field_data>;
}


When the action method for the bar token fires, it adds the string of the bar to $.current-tune. When the action method for line_of_music fires, it adds a newline. And when a fatal error occurs parsing an ABC file, it prints out this attribute, resulting in error messages that look like this: Unhandled exception: Did not match ABC grammar: last tune understood: Silent The Lonely Glen g/2f/2|ed/2B/2 AG/2A/2|B/2d^c/2 dg/2f/2|ed/2B/2 AG/2A/2| B/2>A/2G/2  While this isn’t an exact reproduction of what’s in the tune, it’s close enough to easily show where the problem is: g/2f/2|ed/2B/2 AG/2A/2|B/2d^c/2 dg/2f/2|ed/2B/2 AG/2A/2| B/2>A/2G/2/F/2 eg/2f/2|ed/2B/2 ba/2g/2|g/2f/2e/2^d/2|  You just look at where the grammar stopped understanding the file, and the error pops right out if you know ABC: B/2>A/2G/2/F/2. Delete that duration not attached to a note, and the grammar will zip right through the tune. I’m sure there are more sophisticated ways to handle errors in parsing available in Perl 6, because the STD grammar uses them to give you great error messages. But this is a simple technique which is pretty handy. ## About ABC May 23, 2012 I’ve referenced the ABC module a number of times over the years on this blog, but I don’t think I’ve ever properly explained it. I think it’s particularly relevant with the recurring discussions going on about Perl 6’s “production readiness”, because at this point the abc2ly.pl script in the module is a productive tool I am using on a regular basis. One that would have been harder to write in almost any other programming language, I should add. ABC format is a designed to make it very easy to enter single-line music. It’s designed to be a terse ASCII representation of the music, and it’s absolutely terrific for notating jigs and reels and things like that. Here’s an example ABC hornpipe: X:20 T:Two Weeks To Wait M:4/4 L:1/8 C:Solomon Foster R:Hornpipe K:Edor GF|:E2GB AGFA|(3Bcd ed gedB|A2GB ABGA|BgdB A2BG| E2GB AGFA|(3Bcd ed gedB|A2GB ABGF|E2ED E2GF:| |:E2GB edef|g2fg edBG|A2GB ABGA|BgdB e2ef| g2fg edBG|~A3B AGEG|~A3B AGFG|E2ED E2GF:|  A tune considers of a header section with information about the tune and the basic glue to define the tune, followed by a list of notes, barlines, repeat signs, and other symbols. The ABC module’s abc2ly.pl script uses the ABC module’s ABC library to read the file, and outputs the music in the Lilypond format. Lilypond produces very nice output, but the file format is a pain. For instance, the (3Bcd triplet in the above file is translated to this: \times 2/3 { b'8[ cis''8 d''8] }. But the PDF Lilypond outputs from it is beautiful. I’ve been using this tool to handle my sheet music creation needs for about a year now, but in the last six weeks I’ve really stepped things up. I’ve added multi-note stems, chord names, text annotations, ties, slurs, and many other markings I needed, as well as fixing a number of bugs. As a test I threw the Ed Reavy ABC collection at it today. (After deleting the comment lines in the beginning, because I haven’t added them to the parser yet.) In the process I found four mistakes in the ABC file which threw the parser. Once they were fixed, it processed all 127 tunes in about 22 seconds. I haven’t examined the 44 pages of sheet music output by Lilypond closely yet, but it looks like all the problems I see with the output are actually errors (that parse) in the ABC file. In conclusion, while it doesn’t implement the entire ABC standard yet, this is a perfectly usable ABC to Lilypond translator, capable of handling fairly complex single-staff ABC files quickly and accurately. It’s definitely reached the point where it’s more than reasonable for other people to start using it. ## Optimizing abc2ly.pl May 21, 2012 Shortly after I wrote my last blog post, I started actively working on the ABC module again. First I had to enhance a few features so it could handle some whistle parts for a little musical I was asked to play for. Next I needed some enhancements to handle whistle / fiddle parts for the standard wedding music pieces for an upcoming gig. Finally getting obsessed with ABC updates, I decided to throw the code at a collection of 159 tunes I put together a few years ago. Suddenly I was running into hard limitations of the ABC module. It was always slow, but taking 40 seconds to process my file of three tunes wasn’t really a problem for me. I did add some progress notification messages, and was shocked to learn it wasn’t the parsing of the ABC files that was slow, it was processing the parsed result. But when I got things working well enough it was actually trying to process all 159 tunes, it would run for 15 or 20 minutes and then crash. (This is under Niecza, BTW.) I spent Friday and Saturday trying to optimize bits of code in there that I knew were terrible, with no noticeable improvement. Finally I did a profile run under mono. I found the results didn’t really mean anything to me, but I sent it to sorear++ to see what he thought. And he said, “Why do you need to use .eval in abc2ly.pl?” Well, I quickly checked, and found all the uses were intended to convert strings like "3/4" to Rats. You may say “Why were you doing that? Shouldn’t .Numeric work just as well?” And the answer is yes, it does. (In my defense, I’m pretty sure that didn’t work when I started working on the ABC module.) But now here is the tricky bit. Here’s a simple script to demonstrate the issue: my$a = 0;
for 1..400 {
$a += "1/$_";
}
say $a;  On my machine, that takes 1.3 seconds to run under Niecza. Now let’s make a simple change: my$a = 0;
for 1..400 {
$a += "1/$_".eval;
}
say $a;  This version, with just the .eval added, takes 39.0 seconds to run. Huge difference, eh? It turned out that even though the evals were happening per measure of music and not per note, they were so slow they were completely swamping the performance of my script. With the evals, processing the wedding.abc file took about 28 seconds. Without them, 3.7 seconds. And remember the big ABC file that would run for 15 minutes and then croak? Without the evals, the entire file processes in 31.8 seconds. That’s easily 30 times faster. Lesson? Don’t use eval unless you really, really need to. This advice holds for Rakudo as well: the above fraction-summing scripts took .3 and 8.5 seconds to run, respectively. (Yes, Rakudo is doing a lot better performance-wise on those scripts than Niecza did!) ## ABC module now works on Rakudo and Niecza April 19, 2012 I haven’t had many tuits to work on Perl 6 lately — except this week I needed to have three pages of transposed single-line sheet music ready to go for tonight’s rehearsal. The ABC module (particularly the abc2ly.pl script to convert ABC to Lilypond) was the obvious choice to use. However, to make it work well, I needed to add ties, slurs, multi-measure rests, fermatas, and text messages, as well as fixing the key change and meter change code. While I was at it, I went ahead and got all the tests passing again under Rakudo (version 2012.02-173-gb13c517, the latest won’t build on my Mac) and for the first time, Niecza (latest source from github only, I actually added a small patch in the process). It feels so good to have this working under both compilers! ## mjd’s magic z March 13, 2012 So, I plunged ahead and implemented mjd’s z function in Perl 6. This is the first simple step to full continued fraction arithmetic, allowing addition with a rational, multiplication by a rational, and taking the reciprocal of a continued fraction. (And oh! I just accidentally discovered that I’d missed a bunch of slides at the end of mjd’s talk which discuss implementing the full z function from HAKMEM!) The implementation turned out to be pretty straightforward, with only the edge cases causing trouble. z maintains its state using four variables, $a, $b, $c, and $d. There are two basic operations which modify this state: input, which takes the next value from the continued fraction provided, and output, which outputs the next value of the continued fraction result. How do you know which to do? Easy! If $a div $c ==$b div $d, then we’ve determined the next value which can be output, and should output it. If not, then we need to input another value. The tricky bits: first, it’s completely legal for $c and/or $d to be 0. That causes a division by zero error in Niecza. So I added a check for each, setting the result of the division to Inf when it needs to be. The second thing is what happens when the given continued fraction runs out of values. In that case, the next value is implicitly Inf. That’s well and good, but mjd somehow performs magic with it. I don’t understand what the heck he was doing there, but I can imitate the result. Here it all is in one function: sub z($a is copy, $b is copy,$c is copy, $d is copy, @x) { gather loop { my$a-div-c = $c ??$a div $c !! Inf; my$b-div-d = $d ??$b div $d !! Inf; last if$a-div-c == Inf && $b-div-d == Inf; if$a-div-c == $b-div-d { my$n = $a-div-c; ($a, $b,$c, $d) = ($c, $d,$a - $c *$n, $b -$d * $n); take$n;
} else {
if @x {
my $p = @x.shift; ($a, $b,$c, $d) = ($b, $a +$b * $p,$d, $c +$d * $p); } else { ($a, $b,$c, $d) = ($b, $b,$d, \$d); # WHY????
}
}
}
}


And there you have it! I’m still a bit worried about the edge cases (I haven’t tested negative numbers yet, for instance), but I think it clearly points the way onto the full HAKMEM z function.