Fixing Tags

November 9, 2011

So, in my last post I identified 3906 files in my MP3 collection with missing tags. This time I set out to fix some of them.

So, first I went through the list I generated with the last script and singled out all 2294 files which used a standard pattern of Artist / Album / Track Number - Track Name. Then I wrote this script:

my $for-real = Bool::True;

constant $TAGLIB  = "taglib-sharp,  Version=2.0.4.0, Culture=neutral, PublicKeyToken=db62eba44689b5b0";
constant TagLib-File    = CLR::("TagLib.File,$TAGLIB");
constant String-Array   = CLR::("System.String[]");

for lines() -> $filename {
    my @path-parts = $filename.split('/').map(&Scrub);
    my $number-and-title = @path-parts.pop;
    next unless $number-and-title ~~ m/(\d+) \- (.*) .mp3/;
    my $track-number = ~$0;
    my $title = ~$1;
    my $album = @path-parts.pop;
    my $artist = @path-parts.pop;
    say "$artist: $album: $title (track $track-number)";

    if $for-real {
        my $file;
        try {
            $file = TagLib-File.Create($filename);
            CATCH { say "Error reading $filename" }
        }

        $file.Tag.Track = $track-number.Int;
        $file.Tag.Album = $album;
        $file.Tag.Title = $title;
        $file.Tag.Performers = MakeStringArray($artist);
        
        try {
            $file.Save;
            CATCH { say "Error saving changes to $filename" }
        }
    }
}

sub Scrub($a) {
    $a.subst('_', ' ', :global);
}

sub MakeStringArray(Str $a) {
    my $sa = String-Array.new(1);
    $sa.Set(0, $a);
    $sa;
}

For the main loop, the first half uses standard Perl techniques to extract the artist, album, and track info from the path. The second half sets the tags. Opening the file is the same as last time, and then setting Track, Album, and Title is as simple as could be. The Performers tag is a bit tricky, because it’s a string array (the others are simple strings or integers) and Niecza doesn’t know how to do the coercion automatically. MakeStringArray gets the job done nicely.

So, if you’ve done this sort of thing in Perl 5 using the MP3 CPAN modules, there’s nothing at all revolutionary about this code. But it feels really good to be able to do it with Perl 6!

Examining MP3 Tags

November 6, 2011

I’ve been playing around with Niecza’s ability to handle CLR libraries. It’s actually kind of intoxicating; it’s the closest thing yet to having a CPAN for Perl 6. So I decided to see what I could do with the TagLib# library for dealing with media file tags.

Now, I’ve got an MP3 library with 23564 MP3 files in it, the majority of which were created by older ripping files that didn’t do anything with the ID tags. Most of those have been updated to include tags, but every now and then I add one of the old directories to iTunes and get a bunch of “Unknown Artist” / “Unknown Album” tracks.

So I thought a nice first project would be figuring out which of the tracks was correct. The first thing to do was to get TagLib# properly installed on my MacBook Pro. make install didn’t add the DLL to the main GAC; I ended up installing it there by hand, which was trivially easy once I knew what to do:

sudo gacutil -i taglib-sharp.dll
sudo gacutil -i policy.2.0.taglib-sharp.dll

Once I had that done, I experimented with it for a bit, and ended up with this script:

constant $TAGLIB  = "taglib-sharp,  Version=2.0.4.0, Culture=neutral, PublicKeyToken=db62eba44689b5b0";
constant TagLib-File    = CLR::("TagLib.File,$TAGLIB");

for lines() -> $filename {
    try {
        my $file = TagLib-File.Create($filename);
        unless $file.Tag.JoinedPerformers ~~ m/\S/ && $file.Tag.Title ~~ m/\S/ {
            say $filename;
        }
        CATCH { say "Error reading $filename" }
    }
}

The first line specifies the exact assembly we want to use; you can get the details from gacutil -l. The next line effectively imports the TagLib::File class into Niecza. I get my filenames from stdin, as that allows me to use find to generate the list of MP3 files.

This was my first use of exception handling in Perl 6. I needed it because TagLib-File.Create throws an exception when it isn’t happy with the MP3 file. When it is happy with it, $file is an object of type CLR::TagLib::Mpeg::AudioFile. $file.Tag.JoinedPerformers gives the list of performers (AKA artists) as a single string; $file.Tag.Title gives the title as a string. Unless we find a valid non-space character in both of them, we flag the file by printing it out.

Really, the only way it could be significantly simpler than this would be if the constant TagLib-File line were unnecessary!

End result: I have a list of 3906 files it flagged, 77 of which were read errors.

My next step is to write some code which translates the filenames (which are mostly of the form /Volumes/colomon/Albums/Dervish/Live_in_Palma/04-Slow_Reels.mp3) into artist, album, and track name fields, and then set those tags. Based on my initial experiments, I think it’s is going to be incredibly easy…

Ease of FatRat construction

October 18, 2011

So, on #perl6 today tried using the numeric literal .3333333333333333333333333333333. (Warning: exact number of 3‘s may not match original example.) By the spec (as I understand it), this is a Num, because a Rat isn’t accurate enough to represent it. (Not that a Num is, mind you!)

And that got me to thinking: What if you really wanted a FatRat, so you actually got that exact number? Well, if you’re using Niecza (the only p6 to implement FatRat so far), the answer is FatRat.new(3333333333333333333333333333333, 10000000000000000000000000000000). IMO, that’s ridiculously awkward.

The spec may imply you can do it with ".3333333333333333333333333333333".FatRat. That at least avoids the problem of counting the zeros, but it’s still on the ugly side. Likewise FatRat.new(".3333333333333333333333333333333") is awkward. Still, we should certainly support at least one of these options.

I would like to propose again adding an F suffix to indicate a numeric literal should be a FatRat. I don’t think this is something that can reasonably be done with a postfix operator, because if you treat .3333333333333333333333333333333 like a normal numeric value and then try to FatRat it, you will lose the precision you want.

Just as a quick comparison, here’s a bit of the old endless pi code using the FatRat constructor:

sub unit() { LFT.new(q => FatRat.new(1, 1),
                     r => FatRat.new(0, 1),
                     s => FatRat.new(0, 1),
                     t => FatRat.new(1, 1)); }

I’m proposing we should be able to write that as

sub unit() { LFT.new(q => 1F,
                     r => 0F,
                     s => 0F,
                     t => 1F); }

Much shorter and much clearer. I think that’s a big win.

(Note: I’m in no way particularly attached to the letter “F” for this, that was just the first thing that came to mind.)

Complex Issues

August 31, 2011

Sorry for the long silence here, it’s been a busy summer with far too little Perl 6. But I did squeeze in some work on trig, both on nom and niecza. And I ran into a very interesting issue.

My local copy of niecza has S32-trig/sin.t almost working. A few needed skips, but all the core numeric types work. Except…

> is_approx(asin(0.785398163404734 + 2i), 0.341338918259482 + 1.49709293866352i
# got:      2.80025373533031-1.49709293866352i
# expected: 0.341338918259482+1.49709293866352i

niecza> asin(0.785398163404734 + 2i)
2.80025373533031-1.49709293866352i

rakudo> asin(0.785398163404734 + 2i)
0.341338918259481 + 1.49709293866352i

Woah, what’s up with that? Well, it turns out both answers are right in some sense:

niecza> sin(asin(0.785398163404734 + 2i))
0.785398163404734+2i

rakudo> sin(asin(0.785398163404734 + 2i))
0.785398163404734 + 2i

The thing here is that sin is periodic; there are an infinite number of complex numbers it maps to the same result value. That means when you call asin, there are an infinite number of possible results for each input value, and you must somehow choose one of them.

But let’s take a step back from that and look at why I got different results, because I used the exact same formula for asin in both Rakudo and Niecza. That formula is -1i * log(($x)i + sqrt(1 - $x * $x)). Let’s look at the sqrt first:

niecza> my $x = 0.785398163404734 + 2i; sqrt(1 - $x * $x)
-2.21086930051619+0.710488099157523i

rakudo> my $x = 0.785398163404734 + 2i; sqrt(1 - $x * $x)
2.21086930051619 - 0.710488099157523i

As you can see, one answer is the negative of the other. Of course, when you square the results, that additional factor of -1 just goes away, so these are both valid results.

So this leads me to two questions:
1) Should we define one of these two answers as being correct, as far as Perl 6 is concerned? (Or should they both be considered valid results?)

2) If so, which one? And how do we coherently specify that branch?

I thought at first it might be as simple as saying “The branch where the complex result of sqrt for complex numbers with an imaginary value of 0 agrees with the real sqrt result.” But in fact both Rakudo and Niecza already seem to agree for the sqrts of real-valued Complex numbers.

Anyone else out there have a notion?

Perl 6 Numerics

June 29, 2011

I’m giving a talk on Numerics in Perl in two hours at YAPC::NA 2011. I’m posting the slides here. If you have questions, comments, or corrections, please comment on this post.

If you’re interested in general Perl 6 info, you can check out my previous talk.

Euler 5

June 23, 2011

(Wrote this post last week, but accidentally posted it to the wrong blog!)

So, Hacker News brought me this post this morning, and I wanted to see how Perl 6 stacked up to Java / Scala. The answer turned out to be pretty poorly speed-wise, at least in my initial attempts. But that’s a story for another post. Right now I’m just interested in finding a good solution to the problem itself: “What is the smallest positive number that is evenly divisible by all of the numbers from 1 to 20?”

So far, this is my favorite, I think:

sub divides-by-all-up-to($a, $b) {
    !(2..$b).grep($a !%% *);
}

my $N = [*] 2, 3, 5, 7, 11, 13, 17, 19;
my @attempts := $N, 2 * $N ... { divides-by-all-up-to($_, 20) };
say @attempts[*-1];

You can easily get this down to a two-liner, but only at the cost of a good bit of clarity, IMO.

So, what does it do? divides-by-all-up-to checks to see if $a is divisible by all the numbers from 2 to $b. This is one of those concepts for which there is very definitely more than one way to do it in Perl 6. For instance,

    ?all($a <<%%<< 2..$b)

or

    [&&} $a X%% 2..$b

and so on. I choose my approach because it is pretty straightforward, reasonably efficient, and works on both Rakudo and Niecza.

$N is my secret weapon. It’s the product of all the primes less than 20. All of those primes have to be factors of the answer, so we only consider multiples of $N, saving lots of time. To be precise, it means we call divides-by-all-up-to once for every 4,849,845 times the Scala version calls the equivalent function.

The next step is fun with sequences. We want to consider the multiples of $N, stopping when we hit one for which divides-by-all-up-to is true. This is easily expressible using the sequence operator:

    my @attempts := $N, 2 * $N ... { divides-by-all-up-to($_, 20) };

The initial $N, 2 * $N tells the sequence operator that we want the sequence of multiples of $N; the rest tells it when to stop. (We know that it has to stop eventually because 20! has all the correct properties and is a multiple of $N.) The end result is a sequence whose last value is the number we are looking for. And we finally get that last value with standard Perl 6ish @attempts[*-1].

Here’s a much more generalized approach. Not sure if it will work with Niecza or not (I’ve never figured how to use modules with Niecza) but it does work with Rakudo.

use Math::Prime;

sub factors($n is copy) {
    my @primes := primes;
    my $prime = @primes.shift;
    my %factors;
    while $n > 1 {
        if $n %% $prime {
            %factors{$prime}++;
            $n div= $prime;
        } else {
            $prime = @primes.shift;
        }
    }
    return %factors;
}

sub product(%factors) {
    [*] %factors.map({ .key ** .value });
}

sub least-divisble-by-all(@values) {
    my %common-factors;
    for @values -> $i {
        my %factors = factors($i);
        for %factors -> $prime {
            if %common-factors.exists($prime.key) {
                %common-factors{$prime.key} max= $prime.value;
            } else {
                %common-factors{$prime.key} = $prime.value;
            }
        }
    }
    
    product(%common-factors);
}

This version uses the prime factorization of the numbers to smartly figure out the answer. It’s much more flexible and and probably faster for big numbrers, but… it’s just too long to make me love it.

The Pi is done

June 4, 2011

When last we left the endless Pi project, it a “simple” matter of getting FatRats to work. Which in this case, meant getting the Math::FatRat module to work. Which in turn meant getting Math::BigInt to work again. Which meant getting Zavolaj to work again. Frankly, I thought I might have a month’s worth of blog posts left in the project.

Enter Niecza. As of the latest release, it has baked-in FatRats. I was worried about it lacking lazy lists, but it turns out it has them too. I ran into four holes in its implementation of Perl 6: No floor function, no bless, no MAIN, and no sub-signatures. Luckily floor’s easy to write, MAIN is easily skipped, using the default new works, and I just reverted TimToady’s sub-signature suggestion. A bit of work, and I had this, a complete and fully functional Perl 6 spigot stream for pi:

sub floor(FatRat $n) {
    my $mod = $n.numerator % $n.denominator;
    ($n.numerator - $mod) div $n.denominator;
}

sub stream(&next, &safe, &prod, &cons, $z is copy, @x) {
    my $x-list = @x.iterator.list;
    gather loop {
        my $y = next($z);
        if safe($z, $y) {
            take $y;
            $z = prod($z, $y);
        } else {
            $z = cons($z, $x-list.shift);
        }
    }
}

sub convert($m, $n, @x) {
    stream(-> $u { floor($u.key * $u.value * $n); },
           -> $u, $y { $y == floor(($u.key + 1) * $u.value * $n); },
           -> $u, $y { $u.key - $y / ($u.value * $n) => $u.value * $n; },
           -> $u, $x { $x + $u.key * $m => $u.value / $m; },
           0/1 => 1/1,
           @x);
}

class LFT {
    has $.q;
    has $.r; 
    has $.s; 
    has $.t;
    # method new($q, $r, $s, $t) { self.bless(*, :$q, :$r, :$s, :$t); }
    method extr($x) { ($.q * $x + $.r) / ($.s * $x + $.t); }
}

sub unit() { LFT.new(q => FatRat.new(1, 1), 
                     r => FatRat.new(0, 1), 
                     s => FatRat.new(0, 1), 
                     t => FatRat.new(1, 1)); }    
sub comp($a, $b) { 
    LFT.new(q => $a.q * $b.q + $a.r * $b.s,
            r => $a.q * $b.r + $a.r * $b.t,
            s => $a.s * $b.q + $a.t * $b.s,
            t => $a.s * $b.r + $a.t * $b.t);
}

sub pi-stream() {
    stream(-> $z { floor($z.extr(3)); },
           -> $z, $n { $n == floor($z.extr(4)); },
           -> $z, $n { comp(LFT.new(q => 10, r => -10*$n, s => 0, t => 1), $z); },
           &comp,
           unit, 
           (1..*).map({ LFT.new(q => $_, r => 4 * $_ + 2, s => 0, t => 2 * $_ + 1) }));
}

my @pi := pi-stream;
say pi;
say @pi[0] ~ '.' ~ @pi[1..100].join('');

Certainly not as elegant as it might ideally be in the long run, but it works today, calculating 101 digits of pi in 5.7 seconds.

Just Don’t Rakudo It?!

May 31, 2011

After the announcement that the Niecza v6 now supports full bigint Ints, I finally have downloaded it and gotten it working on my MacBook Pro. (The tricky bit was getting mono, making niecza work was trivial after that.)

And with a couple of trivial modifications, I’ve got the old mandelbrot-color script ported to niecza. There is one major issue: Complex.abs doesn’t work properly. The resulting set is still correct, I think, but the coloring is off.

Here’s the crazy bit: it can calculate a 1001×1001 Mandelbrot set in under 5 minutes on my MBP. By comparison, a 145×145 Mandelbrot set in Rakudo runs in about 5 and a half minutes. That’s niecza clocking in at roughly 47 times faster than Rakudo!

My initial impression is that Niecza still has a lot of rough edges and unimplemented features. But it seems to me that with this release, it’s gone from a cool up-and-coming Perl 6 to a real contender.

I guess the real question here is how Rakudo Nom will handle it. Getting rid of all the endless object creation stuff should be a big win for Complex-heavy math code.

Update: And additional quick benchmarks on string code show no advantage what-so-ever for Niecza — even factoring out Niecza’s crazily slow startup time? I suppose I might have managed to choose the worst-possible comparison (from Rakudo’s point of view) for my first benchmark…

Perl 6 resources

April 30, 2011

I’m writing this post in as a reference for the Perl 6 lecture I’m giving today at Penguicon, but of course it may prove generally useful as well.

Official Perl 6 website
#perl6 IRC
STD.pm6 (official Perl 6 grammar)
Perl 6 Spec
Rakudo website
Rakudo repository on github
masak’s History of Perl 6 (Only up to mid-2010, but still very interesting.)
ABC module (used as an example, interesting real-world grammar usage)
jnthn’s Perl 6 talks

An infinite stream of “Pi”

April 27, 2011

So, after TimToady’s help with my last problem, finishing this is trivial. You just convert the Haskell code without worrying about type safety.

type LFT = (Integer, Integer, Integer, Integer) 
extr :: LFT -> Integer -> Rational 
extr (q,r,s,t) x = ((fromInteger q) * x + (fromInteger r)) / 
                           ((fromInteger s) * x + (fromInteger t)) 
unit :: LFT 
unit = (1,0,0,1) 
comp :: LFT -> LFT -> LFT 
comp (q,r,s,t) (u,v,w,x) = (q*u+r*w,q*v+r*x,s*u+t*w,s*v+t*x) 

becomes

sub extr([$q, $r, $s, $t], $x) { 
    ($q * $x + $r) / ($s * $x + $t); 
}

my $unit = [1, 0, 0, 1];

sub comp([$q,$r,$s,$t], [$u,$v,$w,$x]) {
    [$q * $u + $r * $w, 
     $q * $v + $r * $x, 
     $s * $u + $t * $w, 
     $s * $v + $t * $x];
}

And then the final piece in the puzzle,

pi = stream next safe prod cons init lfts where 
  init = unit 
  lfts = [(k, 4*k+2, 0, 2*k+1) | k<-[1..]] 
  next z = floor (extr z 3) 
  safe z n = (n == floor (extr z 4)) 
  prod z n = comp (10, -10*n, 0, 1) z 
  cons z z’ = comp z z’ 

becomes

sub pi-stream() {
    stream(-> $z { extr($z, 3).floor; },
           -> $z, $n { $n == extr($z, 4).floor; },
           -> $z, $n { comp([10, -10*$n, 0, 1], $z); },
           &comp,
           $unit, 
           (1..*).map({ [$_, 4 * $_ + 2, 0, 2 * $_ + 1] }));
}

It’s a very direct translation.

Does it work?

> my @pi := pi-stream;
> say @pi[^40].join('');
3141592653589793238468163213056056860170

Yay!

Except, according to the Joy of Pi, the first 40 digits of pi are

3.1415926535 8979323846 2643383279 502884197 # pi
3.1415926535 8979323846 8163213056 056860170 # ours

What’s going wrong? I haven’t empirically verified it yet, but I’m pretty sure the issue is Rakudo’s Ints and Rats overflowing. Which means our next post is going to have to dive back into Math::BigInt and Math::FatRat…


Follow

Get every new post delivered to your Inbox.