Being a computer scientist is a funny thing. You live on the edge of knowledge in a weird realm that’s not quite mathematics and not quite physical machinery. Like most of the sciences, of course, anything you “discover” was likely already discovered several decades before. But every great once in a while, you bump into a problem that seems to have received very little attention, and you see the existing solutions, and you think, “Huh, it seems like there must be a better way.”

I did that, once, a long time ago: I discovered a really neat technique for computing treemaps on-the-fly for arbitrarily large datasets, and that’s why SpaceMonger 2+ have such incredibly fast renderings. I don’t know for sure if it’s a *novel* solution, but I haven’t seen anyone else do it. One of these years, I need to properly publish a paper on how that works: I haven’t been *intentionally* keeping that algorithm private; I just haven’t gotten around to writing up a really good description of how it works.

But today, I’m going to talk about another algorithm entirely: Linear partitioning a sequence of numbers.

## The backstory

Recently, I was working on a piece of UI code, and I wanted to implement word wrapping, but not in the usual way: Instead of wrapping at *N* columns or pixels horizontally, I wanted to be able to take a given sentence and produce *N* lines of text, each as *equally-sized* as possible. Measure the size of each word in pixels, with the size of a trailing space, then find the breaks so that the lines have as nearly-identical totals as possible, as in the example below:

We hold these truths to be self-evident,

that all men are created equal, that they are

endowed by their Creator with inherent and

inalienable rights, that among these are

life, liberty, and the pursuit of happiness.

(*It’s easy to lay this out visually*!)

I was absolutely certain this had to be a solved problem — especially since the case of a single split for two lines is really easy (see below) — so I started Googling. Text has been around forever, so somebody *must* have a good technique for this, right? Right!?

…ehheh.

After enough searching, I found that this is known in the computer science literature as the *linear partitioning* problem: Dr. Steven Skiena published *the Algorithm Design Manual* back in 1998, and it has a solution that involves using dynamic programming to calculate the partitions, minimizing the difference between the largest total and the smallest total.

Dr. Skiena’s algorithm is a textbook case of dynamic programming. It runs in O(k*n^2) time, which is far better than the naïve solution, where in O(n!/k!) time you just try every possible split. But O(k*n^2) time isn’t really that great, and it’s complex and difficult to understand. I was surprised to see that the “best” solution ran in O(k*n^2) time, since for the special case of k=2 lines, the solution is so easy.

There are a number of other solutions, too, involving exotic techniques like annealing and approximation, and I was convinced all of these were simply too hard for a problem that had to be easier than all this.

And so — I did it better.

Somewhat by accident, I found a much better algorithm, by believing something like my solution *had* to be the solution, before ever seeing the *actual* current-best algorithm and then being surprised that it wasn’t what I’d already come up with.

## A good first try

Let’s take a look at the problem in its simplest case. The goal is to take a sequence of numbers, like [9, 7, 2, 3, 10, 5, 8, 11, 6, 8, 4], and find a *split* somewhere in it such that the two subsequences sum to a number as close to equal as possible.

To divide it in half like this, there’s a really easy algorithm: Start with two empty subsequences. Read the input from *both* ends of the source at the same time. Whichever subsequence has the smallest sum receives the next value. Repeat until all numbers are assigned.

So let’s walk through this for the sequence above:

- [], [4] # totals are 0, 4
- [9], [4] # 9, 4
- [9], [8, 4] # 9, 12
- [9, 7], [8, 4] # 16, 12
- [9, 7], [6, 8, 4] # 16, 18
- [9, 7, 2], [6, 8, 4] # 18, 18
- [9, 7, 2, 3], [6, 8, 4] # 21, 18
- [9, 7, 2, 3], [11, 6, 8, 4] # 21, 29
- [9, 7, 2, 3, 10], [11, 6, 8, 4] # 31, 29
- [9, 7, 2, 3, 10], [11, 6, 8, 4] # 31, 29
- [9, 7, 2, 3, 10], [8, 11, 6, 8, 4] # 31, 37
- [9, 7, 2, 3, 10, 5], [8, 11, 6, 8, 4] # 36, 37

This runs in O(n) time, and it’s startlingly easy to implement. Can’t we just do something *like* it for the general case?

## Towards a better solution

Let’s attack it from another perspective. Instead of an incremental solution like above where we try to balance the lists at every step, can’t we just take advantage of the fact that we *already* know what we want the result to look like? Before we even start, we know that the sum of the numbers is 73. And we want each list to contain half of the total, and half of 73 is 36.5. We know before we *start* that each partition needs to sum as close to 36.5 as possible.

So let’s try again. This time around, let’s just start with what we know for sure, and put in every number from both ends as long as the total of each partition is under 36.5. We *can’t* be wrong doing that, since the goal is to get as close to 36.5 as possible. This jumps us almost right to step eleven above, with just the number “8” leftover in the middle and unassigned:

- [9, 7, 2, 3, 10, 5], 8, [11, 6, 8, 4] # 36, 29

The 8 is the only value that’s interesting, the only value where we have to make an actual decision. I call it a *choice point*: * *The rest of the algorithm is trivial if we can intelligently figure out where to place the 8.

As it turns out, the decision for the *choice point* is easy: We just put it into the list with the smallest sum, and we’re done!

- [9, 7, 2, 3, 10, 5], [8, 11, 6, 8, 4] # 36, 37

This two-step technique arrives at the correct answer, and it requires almost no effort. Here’s the same thing, implemented as real code, in Python (Python’s a great language for explaining things):

def linear_partition_2(sequence): # Calculate the total sum of the sequence. total = sum(sequence) halfway = total / 2 # Quickly fill in partition 'a' with everything that *must* go in it, # going simply from the start toward the end. a = [] a_sum = 0 i = 0 while i < len(sequence): if a_sum + sequence[i] > halfway: break a.append(sequence[i]) a_sum += sequence[i] # Quickly fill in partition 'b' with everything that *must* go in it, # going backwards from the end toward the beginning. We'll create # 'b' backwards too, but we can reverse it in O(n) time. b = [] b_sum = 0 j = len(sequence) - 1 while j >= 0: if b_sum + sequence[j] > halfway: break b.append(sequence[j]) # We can reverse b after we're done b_sum += sequence[j] # Now make a decision at the choice point, if there's anything left. # And rather than comparing i to j here to see if there's a leftover # value, we test if a_sum has reached the halfway point instead, since # if it has, then both lists must be done already. if a_sum < halfway: if a_sum <= b_sum: a.append(sequence[i]) else: b.append(sequence[i]) # We built b backwards, so we need to fix it. b.reverse()

Three trivial loops (the initial `sum()`

counts as a loop), followed by an if-statement comparing the sums. The question is: Can we *generalize* this technique to cases larger than two subsets?

## A critical tweak

Before we really discuss how to generalize it, let’s make one slight change to how we handle the if-statement at the end. Instead of comparing `a_sum`

to `b_sum`

, we can actually make the decision *without knowing* `b_sum`

!

Instead of picking the shortest list, let’s think about it in terms of *error*. We want the *error* in the sums to be as small as possible — that is, we want each of the final sums to be as close to `total/2`

as possible — i.e., we want `abs(a_sum - total/2)`

to be minimized, and the same for `b_sum`

. Importantly, regardless of which list we put `choice_point`

in, the *error* in both lists will be the same: If `a_sum`

is larger than `total/2`

by 5, `b_sum`

will be *smaller* than `total/2`

by 5! Since `abs(a_error) = abs(b_error)`

, we can simply ignore `b`

entirely, and just focus on `a`

— we know that if `abs(a_error)`

is at a minimum, `abs(b_error)`

will be too.

So where is the error the smallest? `a`

and `b`

both have a hole at the end between them and `total/2`

. Those holes are either equal in size, or one is bigger than the other. If we want to minimize the error, we should put `choice_point`

into the biggest hole, so that the smallest amount of `choice_point`

“sticks out” from the hole. This is effectively what we did in the previous algorithm, but we can attack it a different way.

If the holes are equal in size, then they’re exactly equal to `choice_point/2`

, since they must sum up to be equal to `choice_point`

. But if they’re unequal in size, `choice_point/2`

will always be smaller than one of them, and larger than the other. This means that we can make the decision knowing how `choice_point/2`

compares to just one of the remaining holes. So let’s rewrite the final `if`

-statement to compare `choice_point/2`

against `total/2 - a_sum`

, or, more accurately, comparing `a_sum + choice_point/2`

against `total/2`

:

# Now make a decision at the choice point, if there is a choice point. if a_sum < halfway: if a_sum + sequence[i] / 2 <= halfway: a.append(sequence[i]) else: b.append(sequence[i])

Now, here’s the real kicker. Since we don’t need to use `b_sum`

in that final comparison anymore, we don’t even need to compute `b`

at all! `b`

is simply whatever is leftover after we’ve computed `a`

!

def linear_partition_2(sequence): # Calculate the total sum of the sequence. total = sum(sequence) halfway = total / 2 # Quickly fill in partition 'a' with everything that *must* go in it, # going simply from the start toward the end. a = [] a_sum = 0 i = 0 while i < len(sequence): if a_sum + sequence[i] > halfway: break a.append(sequence[i]) a_sum += sequence[i] # Now make a decision at the choice point, if there is a choice point. if a_sum + sequence[i] / 2 <= halfway: a.append(sequence[i])

Study that a bit, and you’ll convince yourself it must be correct.

The most interesting part about the code above is that it doesn’t reference `b`

at all! We can make the decision about which values to put into the first partition *entirely* by just knowing what half the total is. Can we use that fact to generalize it from two partitions to `N`

?

## Paging Mr. Floyd and Mr. Steinberg

I’ve done a lot of graphics over the years, and there’s a graphics technique called Floyd-Steinberg dithering that is used for converting high-precision images to lower-precision images. Wikipedia includes this as an example:

In this example, the original data was presumably a high-precision grayscale: Each pixel could be represented by a real number from 0.0 to 1.0. The resulting image after dithering, though, can only represent pure black-and-white: So the Floyd-Steinberg algorithm uses error-diffusion to spread out the error from being unable to represent the original shades of gray as pure black-and-white. If you look near David’s ear, you’ll see alternating pixels of black and white — 0 1 0 1 0 1 — to represent what was presumably something close to 0.5 0.5 0.5 0.5 0.5 0.5 in the original. The sum over the range is the same, but the *values* used to represent it are different.

How does this help us in the linear partitioning problem? It helps a lot!

We already demonstrated above that we can calculate each partition independently as long as we can figure out the right thing to do with the `choice_point`

. So with `N`

partitions of numbers, we want to *minimize the error* between each partition and the target value of `total/N`

— we want to place each `choice_point`

value so that the *overall error* is as small as possible. And we can use error diffusion to spread out the error between each partition, so that a partition that’s a little too long will have a subsequent partition that’s a little too short to make up for it. We can keep a running tally of the error, and we can ensure that every time we finish a partition, its error is as small as we can make it — or that subsequent partitions make up for it.

Error diffusion ensures that each `choice_point`

is placed in a way that ensures a local minimum for that partition’s error: *It* may not have perfect error individually, but its error combined with that of its immediate neighbors always will approach 0! And if every partition’s error locally approaches 0, then the total error will as well.

## The solution

The resulting algorithm is surprisingly simple code. Here’s a full, working implementation in Python:

def linear_partition(data, num_partitions): total = sum(data) target_size = total / num_partitions result = [] partition = [] partition_sum = 0 for value in data: if (partition_sum + value / 2 <= target_size or len(result) >= num_partitions - 1): partition.append(value) partition_sum += value else: error = target_size - partition_sum result.append(partition) partition = [ value ] partition_sum = value - error result.append(partition) return result

Here, we simply keep track in `partition_sum`

not just the sum of the values in that partition, but also how much `error`

has been contributed to that partition from the previous one. The resulting code is trivial: The `if-`

statement decides between whether we’re filling in an “easy” value or a choice point (the else side). The condition directly tests `value/2`

versus `target_size`

, which is enough to answer both whether the value belongs “easily” in this partition or whether it’s a “choice point” value that we’re putting in this partition instead of the next one.

(The `or len(result) >= num_partitions - 1`

test is added to account for possible float imprecision, since this depends on being able to do math accurately. In real code where I’m using this right now, I also have one more clause here to handle really bizarrely large values that are greater than `total/N`

. The algorithm will “recover” from these by following up the subsequent partitions with almost nothing.)

And that’s literally *it*: No dynamic programming, no annealing, no fancy business. Two simple theta(n) loops (the first just to count the sum), and an if-statement, and you’re done.

Here’s the whole thing annotated with comments for the educational value:

def linear_partition(data, num_partitions): # First, sum up the values in the data. O(n). total = sum(data) # Calculate (in floating point) the ideal average size of every partition # (partition), if it was possible to perfectly assign every value. target_size = total / num_partitions # We will collect the resulting set of partitions in 'result'. result = [] # These hold the current partition and its statistics as we process each value. partition = [] partition_sum = 0 # Main loop: We spin over each value one at a time, and decide whether it # belongs in the current partition or a successive one. O(n). for value in data: # If we can fit half or more of the next value, add it to the current # partition. (The or-clause here exists for float imprecision, and for a # boundary condition that can occur on the very last value of the data.) if (partition_sum + value / 2 <= target_size or len(result) >= num_partitions - 1): # There's enough room in this partition, so just append the value. partition.append(value) partition_sum += value else: # Previous partition is too full to fit the next value, so start a # new partition. But we also calculate 'error', the amount we *should* # have distributed to the previous partition but couldn't because we # couldn't split the value. error = target_size - partition_sum # Previous partition is finished, so keep it as-is. result.append(partition) # Start a new partition that contains the new value. partition = [ value ] # The sum of this new partition is already overlarge by 'error', so # distribute that error into the partition's current sum. partition_sum = value - error # Append the final partition to the result. result.append(partition) return result

Pretty neat. And pretty simple, too. I’m now using this in the UI code I was working on, and it works beautifully. I also tested it on the dataset here, and it works really well on that too, needing no special magic or annealing or anything unusual to get a great result immediately. I did that test in JavaScript, and you can find that implementation here, along with a comment that indicates how surprised I was when it actually worked for the first time 😉

Feel free to use this technique, or this source code, if you want to. I hereby place all of the code in this blog post and in that JSFiddle *into the public domain*, so use it as you see fit, no questions asked.

## Addenda

I e-mailed Dr. Skiena about this, but I haven’t heard back from him. If he sends back a reply, I’ll be sure to post a followup here.

I also e-mailed Jonathan Palardy, who has one of the better available references on this topic on the web, and he added a link on it back to this article.