Simplifying Rationals

So, in my last post I came up with this code for adding two FatRats:

multi sub infix:<FR+>(Math::FatRat $a, Math::FatRat $b) is export(:DEFAULT) {
    my $gcd = gcd($a.denominator, $b.denominator);
    Math::FatRat.new($a.numerator * ($b.denominator div $gcd) + $b.numerator * ($a.denominator div $gcd),
                     ($a.denominator div $gcd) * $b.denominator);
}

Some of you may have noticed something funny here. Why is there a GCD calculation?

Well, this code was a cut-n-paste from the Rat code inside Rakudo. And since Rakudo has only finite Ints, it seemed like a good plan to use the GCD of the two Rats’ denominators to make the numbers as small as possible in the calculation, hoping to keep things inside the range of 32-bit integers.

So, since Math::BigInt has no range limitation and Math::FatRat.new is going to simplify the fraction anyway, it seems like calculating the GCD here is a waste of time. Or is it? I’m leaning toward it being a waste of time, but I hesitate to say that for sure without doing some timing tests, which I don’t want to go into now. Instead, I’m going to open an entirely different cartoon of worms in this same neighborhood.

Because the spec is actually kind of vague on when and even if you should — or can! — simplify fractions. I’ve found three areas in S02 which touch on this:

The limitation on Rat values is intended to be enforced only on user-visible types. Intermediate values used internally in calculation the values of Rat operators may exceed this precision, or represent negative denominators. That is, the temporaries used in calculating the new numerator and denominator are (at least in the abstract) of Int type. After a new numerator and denominator are determined, any sign is forced to be represented only by the numerator. Then if the denominator exceeds the storage size of the unsigned integer used, the fraction is reduced via gcd. If the resulting denominator is still larger than the storage size, then and only then may the precision be reduced to fit into a Rat or Num.

Rat addition and subtraction should attempt to preserve the denominator of the more precise argument if that denominator is an integral multiple of the less precise denominator. That is, in practical terms, adding a column of dollars and cents should generally end up with a result that has a denominator of 100, even if values like 42 and 3.5 were added in. With other operators, this guarantee cannot be made; in such cases, the user should probably be explicitly rounding to a particular denominator anyway.

Although most rational implementations normalize or “reduce” fractions to their smallest representation immediately through a gcd algorithm, Perl allows a rational datatype to do so lazily at need, such as whenever the denominator would run out of precision, but avoid the overhead otherwise. Hence, if you are adding a bunch of Rats that represent, say, dollars and cents, the denominator may stay 100 the entire way through. The .nu and .de methods will return these unreduced values. You can use $rat.=norm to normalize the fraction. (This also forces the sign on the denominator to be positive.) The .perl method will produce a decimal number if the denominator is a power of 10, or normalizable to a power of 10 (that is, having factors of only 2 and 5 (and -1)). Otherwise it will normalize and return a rational literal of the form -47/3.

I actually find these paragraphs somewhat bewildering. Let me try to sum up the points as I’m seeing them, in vaguely reversed order.

1. Rats are “allowed” to be “lazy” and never simplify, but must always return the simplified version when .perl is called. (I put lazy in quotes because Rats are immutable, so in fact the Rat object in question will always be unsimplified.) Which means that Rat.perl doesn’t provide a way of actually getting at actual value stored in a Rat; all you get is another Rat which has the same numeric value. That’s…. weird.

2. “Rat addition and subtraction should attempt to preserve the denominator…” That sounds like Rats are required to be lazy, at least in some circumstances.

3. Note that those circumstances are a bit weird. It’s pretty easy to find sets of four Rats which can have a different denominator based on the order you add them.

4. Note also that this property goes away as soon as .perl is involved.

It feels to me like there are two distinct ideas here, at odds with each other. In one paragraph, the developer is allowed to break some fundamental assumptions of Perl 6 to make rational math more efficient. In another, the developer is required to try to bend how rationals work make rational math more friendly or something. They are at odds because the efficiency version makes .nu and .de into something you probably don’t want to look at, whereas the only way to take advantage of the “friendly” version is to look at those exact same values! Not to mention the extra work needed to make the friendly version work probably kills most or all of the hypothetical efficiency improvements available in the other.

Why do I say that? Let’s look at implementations of the two separate approaches. Here’s the efficient one (remember that in this version, FatRat.new doesn’t do anything but store the numerator and denominator it is given):

multi sub infix:<FR+>(Math::FatRat $a, Math::FatRat $b) is export(:DEFAULT) {
    Math::FatRat.new($a.numerator * $b.denominator + $b.numerator * $a.denominator,
                     $a.denominator * $b.denominator);
}

You might want to consider adding an if check there to see if the two denominators are equal, as that can save you three multiplications. Whether or not that would help probably depends on the size of the Ints and whether or not you’re adding a lot of rationals with the same denominator.

On the other hand, the “friendly” version would have to be coded something like this:

multi sub infix:<FR+>(Math::FatRat $a, Math::FatRat $b) is export(:DEFAULT) {
    if ($a.denominator %% $b.denominator) {
        Math::FatRat.new($a.numerator + $b.numerator * $a.denominator div $b.denominator,
                     $a.denominator);
    } elsif ($b.denominator %% $a.denominator) {
        Math::FatRat.new($a.numerator * $b.denominator div $a.denminator + $b.numerator,
                     $b.denominator);
    } else {
        Math::FatRat.new($a.numerator * $b.denominator + $b.numerator * $a.denominator,
                     $a.denominator * $b.denominator);
    }
}

Note that this version adds two useless (and probably relatively expensive!) is-divisible-by tests to what I would consider to be the “normal” case. Even in the best-case scenario of hitting the first special case, you’ve just replaced an operation which required three multiplications and an addition with one that requires a divisible-by test, a division, a multiplication, and an addition. Unless that prevents the size of the denominator from growing very huge, that’s probably a pessimization. (And note that if you’re doing Rat arithmetic, it’s already required if the denominator is too big to fit in an int, the fraction is simplified if possible.)

On the other hand, who am I to argue against it on the basis of efficiency? My instinct is very strongly to always simplify. Though I note that a GMP reference says “In general, cancelling factors every time is the best approach since it minimizes the sizes for subsequent operations.”

What do I think? Well, I completely fail to understand the usage case for the “friendly” paragraph. If it’s just an optimization, then it shouldn’t be “required”. If it’s intended to make things easier on the user, it’s remarkably fragile and hard to use. Suppose you think you’re adding adding dollars and cents, and want to get the result $a in terms of the number of cents (about the only practical use case I can think of). Then you need to do something like

    $a += 0/100; # make sure the result is at least in hundredths.
    fail if $a.denominator != 100; # make sure the result really is hundredths
    $a.numerator;

Wouldn’t it make more sense to have a .numerator-if-denominator-was($n) method, which returns what the numerator would be if the denominator was $n? It would look something like this:

    method numerator-if-denominator-was(Int $n) {
        fail unless 100 %% $.denominator;
        $.numerator * 100 div $.denominator;
    }

It seems like this would cleaner for the user, while not putting any requirements on the internal structure of the rational type at all.

(Of course, if you really want to deal with dollars and cents, you should probably be creating a type specifically to handle that! It could easily be substantially more efficient than a generic rational class.)

My initial inclination with the other question is that I’m inclined to go one of two ways. If we really want to allow Rats which have not been simplified, then we should go whole hog and even .perl should return the unsimplified value. If you want it simplified, call .norm.perl.

Or perhaps Rats are always simplified if possible, but we add language to the spec which allows a Perl 6 implementation to maintain the unsimplified version in the midst of a series of calculations for efficiency purposes. Why would such language be needed? My notion (which I haven’t pinned down with hard numbers yet) is that there are some operations which would spill out of a Rat if you did them step by step, but might simplify to a value which can be held in a Rat once all the operations were done and the result simplified. The language would mean the optimized code wouldn’t have to track each operation to see if the partial solution would fit in a Rat.

Hmmm. I think more research and benchmarking are probably called for…

Leave a comment