Linear Partitioning (Part 2)

<< Part 1

Lately, I’ve been growing increasingly obsessed with this problem. While my solution is very fast (O(n) is pretty fast!), I’ve been concerned about a few possible issues with it:

First, I wasn’t certain it was anything better than locally-optimal. It’s guaranteed not to produce a bad result, but will it produce a good result? I couldn’t be sure.
Second, it relies on floating-point arithmetic, while most other solutions don’t.

It has the nice upside of being able to operate in constant space (not including the O(k) output space), and linear time, but those two caveats are potentially problematic. If it turned out to be a really bad solution, who’d use it? And the floating-point numbers felt too fuzzy to be safe. So I started poking at it again.

Fixing the Floats

I started over the last few days with the floating-point number problem, and I realized that the DDA (digital differential analyzer) really doesn’t need them to operate correctly. It doesn’t need infinite precision: The floats are really fractions, simple rational values, not arbitrary real numbers; and the denominator is small (numpartitions * 2). So if we were to scale the fractions to become integers, we’d only need integers large enough to hold the total sum times the number of partitions times two.

So I coded up a simple replacement for the original code that solves it in pure integer math. I made the first version of that in C#, and then to show that there’s nothing special or magical about it, I coded up a pure C version that invokes no functions and uses nothing internally but integers and basic arithmetic (not even division!).

Nothing up my sleeve, presto — !

/*
 * Generate linear partitions of the numbers in 'values' of count 'numValues',
 * and write the start indices of each partition to 'partitions'.
 *
 * This uses a DDA approach to slice up the values so that there are
 * 'numPartitions' partitions of nearly equal size; the result is locally optimal
 * (but not necessarily globally optimal), and requires only two total passes
 * over the data, and requires only six total integer local variables as storage,
 * and only addition, subtraction, multiplication, and if-statements and loops,
 * which means this compiles to very efficient assembly language and can run well
 * even on *extremely* resource-constrained hardware.
 */
void LinearPartition(int[] values, int numValues, int[] partitions, int numPartitions)
{
    int i;
    int size, targetSize;
    int currentSum;
    int partitionIndex;
    int isFirst;

    targetSize = 0;
    for (i = 0; i < numValues; i++)
        targetSize += values[i];

    targetSize *= 2;

    currentSum = 0;
    partitionIndex = 0;
    isFirst = 1;

    partitions[partitionIndex++] = 0;

    for (i = 0; i < numValues; i++)
    {
        size = values[i] * numPartitions;
        if (currentSum + size < targetSize || isFirst)
        {
            currentSum += size * 2;
            isFirst = 0;
        }
        else
        {
            partitions[partitionIndex++] = i;
            currentSum = size * 2 - (targetSize - currentSum);
            isFirst = 1;
        }
    }
}

Two passes over arrays, and pure integers. I don’t think it’s possible to make a solution that runs faster than this does, and it produces exactly the same output as the previous version. (If you happen to already have the total sum of the values available, you can even skip the first pass over the data!)

Statistical Analysis

But there’s still the question of how good is it? It’s “pretty good,” in that we know it makes locally-optimal decisions for every value to be added (see part 1 where I explained how it works), but is it globally optimal? Does it beat Dr. Skiena’s dynamic-programming solution, or at least match his solution for quality?

No.

That’s a shame, given how fast it is, but no, it doesn’t.

I implemented several different possible solutions against Jonathan Palardy‘s data set, all in C#, and I ran them to see what they’d do. I made a nice little wrapper that generated statistics so I could see how my solution compared to the others. And while my solution is hands-down the fastest, a few of the other solutions edge it out very slightly for quality of the result.

Here’s output from a recent run:

Linear partition by identical count:
    [ 2213, 1604, 1341, 1892, 1996, 2035, 1791, 1201, 1473, 1259 ]
    Partitions:10 (vs 10)  Total:16805  Mean:1,680.50 (vs 1,680.50)  StdDev:337.26  Min:1201  Max:2213  Worst:±532.50

Linear partition by greedy cutoff:
    [ 1455, 1510, 1450, 1543, 1348, 1384, 1488, 1672, 1594, 1607, 1606, 148 ]
    Partitions:12 (vs 10)  Total:16805  Mean:1,400.42 (vs 1,680.50)  StdDev:388.71  Min:148  Max:1672  Worst:±1,252.42

Linear partition by float DDA:
    [ 1839, 1616, 1703, 1583, 1490, 1947, 1672, 1594, 1607, 1754 ]
    Partitions:10 (vs 10)  Total:16805  Mean:1,680.50 (vs 1,680.50)  StdDev:128.44  Min:1490  Max:1947  Worst:±266.50

Linear partition by integer DDA:
    [ 1839, 1616, 1703, 1583, 1490, 1947, 1672, 1594, 1607, 1754 ]
    Partitions:10 (vs 10)  Total:16805  Mean:1,680.50 (vs 1,680.50)  StdDev:128.44  Min:1490  Max:1947  Worst:±266.50

Linear partition by integer DDA, C-style:
    [ 1839, 1616, 1703, 1583, 1490, 1947, 1672, 1594, 1607, 1754 ]
    Partitions:10 (vs 10)  Total:16805  Mean:1,680.50 (vs 1,680.50)  StdDev:128.44  Min:1490  Max:1947  Worst:±266.50

Linear partition by Skiena's dynamic programming algorithm:
    [ 1839, 1616, 1703, 1892, 1640, 1488, 1672, 1594, 1607, 1754 ]
    Partitions:10 (vs 10)  Total:16805  Mean:1,680.50 (vs 1,680.50)  StdDev:114.71  Min:1488  Max:1892  Worst:±211.50

For each algorithm, the first row lists the sums of the values in each partition. The second row shows:

How many partitions were produced (notice that “greedy cutoff” gets this wrong);
The sum of all values (a simple safety check to make sure none were dropped);
The mean of the partition sizes (again, notice that “greedy cutoff” gets this wrong);
The standard deviation of the partition sizes
The minimum and maximum partition sizes
And the worst difference from the mean.

(Source code here.)

“Identical count” and “greedy cutoff” are the same first two solutions on Jonathan Palardy‘s page, and perform identically to his results. They make a good baseline. They’re also pretty bad, as measured by their means and standard deviations.

Dr. Skiena’s dynamic-programming solution edges in with the best standard deviation and best worst-difference. One would expect this, since it’s a provably-optimal solution. But it’s also O(k*n^2), which is pretty awful for performance; imagine trying to linear-partition a million items — you’d never finish!

Palardy’s water-flow technique, which I didn’t try to reimplement, has an impressive standard deviation of 120.9 but it uses simulated annealing to get that: Again, that works well enough for small data sets, but it probably doesn’t scale to anything huge.

Still, I’ll take it

So while my algorithm doesn’t match a couple of the better ones for quality, it knocks everybody else out of the park for performance, and its output is still very good. It manages to produce the same exact result as Dr. Skiena’s optimal version for seven out of the ten partitions, which isn’t too shabby for how simple it is and how fast it runs. It could realistically be used on a 16-bit microcontroller, or could have been used for text layout on the original Macintosh or an SNES; not much of anything else is going to compete against it for constrained environments like that.

It also has an interesting side benefit, in that it’s very nearly an offline or streaming algorithm because it only looks at each value once after summing them: If you’re able to calculate the total sum of the values upfront, you could actually perform reasonably-efficient linear partitioning on data that’s too big to fit in memory!

So in some ways, I’m a little disappointed that I didn’t beat the best output; but at the same time, for practical, real-world use, my algorithm might actually be a winner: It’s trivial to implement, it’s faster than everything else out there, and it produces output that — while not optimal — is at least highly-competitive with the best solutions.

so i got that goin' for me which is nice - Bill Murray Caddyshack | Meme Generator

In spending a bit of time thinking about it, I don’t think it’s possible to produce a better solution that only looks at each value individually, in isolation — this algorithm may be the best answer that can be found in O(n).

But I’m still eyeing that nice output that Dr. Skiena’s solution produces, and that crazy O(k*n^2) runtime he has. I think there might be a solution that can match his results faster than his solution can; but unlike my current, elegant O(n) solution, the kinds of techniques I’m thinking about for that are big and require complex secondary data structures. I think it might be possible to produce a globally-optimal solution in O(n lg n).

That said, for real-world use, I’m likely to stick with my pretty O(n) solution: It might not produce the absolute best optimal output, but it produces really good output, and it runs circles around everything else.

Source code

The C# source code for the example comparison program above can be found in my GitHub account and is covered under the MIT open-source license. And like the previous blog posting, I hereby place all of the code in this blog post into the public domain, so use it as you see fit, no questions asked. Steal and enjoy 🙂

Fixing the Floats

Statistical Analysis

Still, I’ll take it

Source code

Résumé

Related Links

Recent Posts

Archives

Meta