Bin Packing Algorithms.
----------------------
1. A classical problem, with long and interesting history.
One of the early problems shown to be intractable. Lends to simple
algorithms that require clever analysis.
2. You are given N items, of sizes s1, s2, ..., sN. All sizes are such
that 0 < si <= 1. You have an infinite supply of unit size bins.
Goal is to pack the items in as few bins as possible.
EXAMPLE: 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8
3. Many many applications:
placing data on multiple disks; job scheduling;
packing advertisements in fixed length radio/TV station breaks;
or storing a large collection of music onto tapes/CD's, etc.
4. Two versions.
Online---items arrive one at a time (in unknown order), each must
be put in a bin, before considering the next item.
Offline--all items given upfront.
The online problem would seem more difficult. In fact, it's easy to
convince ourselves that a ONLINE algorithm cannot always get the
optimal solution. Consider the following input:
M "small" items of size 1/2 - e, followed by M "large" items of
size 1/2 + e, for any 0 < e < 0.001.
The optimal solution is to pack them in pairs (one small, one large);
this requires M bins.
Now, the ONLINE algorithm doesn't know what's coming down the pipe,
or even how long the pipe is. So, for instance, what should it do
with the first M small items. If it packs 2 of them in each bin,
then it will be stuck when the second half arrives, with M large
items. On the other hand, if it puts one small items in each bin
in the first half, then we can just stop the input right there, in
which case the algorithm would have used twice as many bins as needed.
5. This ad hoc argument is not a proof. But we can turn this into a
formal proof, and show the following LOWER BOUND.
There exist inputs that can force ANY online bin-packing algorithm
to use at least 4/3 times the optimal number of bins.
PROOF.
An important observation is that because we (the adversary) can
truncate the input whenever we like, the algorithm must maintain
its guaranteed ratio AT ALL points during its course.
Consider the input sequence:
I1, sequence of M small items of size (1/2 - e), followed by
I2, sequence of M large items of size (1/2 + e).
Let's consider the state of the online algorithm after it has processed
I1. Suppose it has used b number of bins. At this point, the optimal
solution uses M/2 bins, so if the online algorithm beats 4/3 ratio,
it must satisfy:
b/(M/2) < 4/3 ==> b/M < 2/3. (*)
Now consider the state of the online algorithm after all items have
been processed. Since all new items have size > 1/2, every NEW bin
created after the first b bins will have exactly one item put in it.
(Some items may go into the first b bins.)
Since only the first b bins can have 2 items, and the remaining bins
have 1 item each, we see that packing 2M items will require at least
(2M - b) bins. Again, since the optimal at this stage is M bins, the
online algorithm must guarantee that (2M - b) < 4M/3, which simplifies to
b/M > 2/3. (**)
But now we have a contracdiction, (*) and (**). Thus, NO online
algorithm can beat the 4/3 ratio.
We now show 3 very simple online algorithms that each uses at most
twice the optimal bins.
6. Next Fit.
When processing the next item, see if it fits in the same bin
as the last item. Start a new bin only if it does not. Incredibly simple
to implement (linear time.)
Example:
empty empty empty empty empty
0.5 0.1
0.2 0.4 0.7 0.3 0.8
Next Fit also has a simple worst-case analysis.
Theorem: If M is the number of bins in the optimal solution, then
Next Fit never uses more than 2M bins.
There exist sequences that force Next Fit to use 2M-2 bins.
Proof.
Consider any two adjancent bins. The sum of items in these
two bins must be > 1; otherwise, NextFit would have put all
the items of second bin into the first. Thus, total occupied
space in (B1 + B2) is > 1. The same holds for B3+B4 etc....
Thus, at most half the space is wasted, and so Next Fit uses
at most 2M bins.
For the lower bound, consider the sequence in which si = 0.5 for
i odd, and si = 2/N for i even. (Suppose N is divisible by 4.)
Then, the optimal puts all 0.5 items in pairs, using N/4 bins.
All small items fit in a single bin, so the opt is N/4 + 1.
Next Fit will put 1 large, 1 small in each bin, requiring N/2 bins.
Lower Bound:
0.5 0.5 ... 0.5 2/N
0.5 0.5 ... 0.5 2/N
...
B1 B2 B_{N/4} B_{N/4 + 1}
empty empty ... empty empty
2/N 2/N 2/N 2/N
0.5 0.5 0.5 0.5
B1 B2 B_{N/2}
7. First Fit.
Next Fit can be easily improved: rather than checking just the
last bin, we check all previous bins to see if the next item will
fit. Start a new bin, only when it does not.
Example:
empty empty empty empty
0.1
0.5 0.3
0.2 0.4 0.7 0.8
First Fit easy to implement in O(N^2) time. With proper data structures,
it can be implemented in O(N log N) time.
Theorem: First Fit never uses more than 2M bins, if M is the optimal.
Proof. At most one bin can be more than half empty: otherwise the
contents of the second half-full bin would be placed in the first.
Theorem: If M is the optimal number of bins, then First Fit never
uses more than 1.7M bins. On the other hand, there are
sequences that force it to use at least 17/10 (M-1) bins.
The upper bound proof is quite complicated.
We show an example that forces First Fit to use 10/6 times optimal.
Consider the sequence: 6M items of size 1/7 + e; followed by 6M
items of size 1/3 + e; followed by 6M items of size 1/2 + e.
Optimal strategy is to pack each bin with one from each group,
requiring 6M bins.
When First Fit is run, it packs all small items first, in 1 bin.
It then packs all medium items, but requires 6M/2 = 3M bins.
(Only 2 per bin fit.) It then requires 6M bins for the large items.
Thus, in total First Fit uses 10M bins.
empty empty empty
1/7
1/7
...
1/7 1/3
1/7
1/7
...
1/7 1/3 + e
1/7 1/3 + e 1/2 + e
9. Best Fit.
The third strategy places the next item in the *tightest* spot.
That is, put it in the bin so that smallest empty space is left.
Example. 0.2, 0.5, 0.4, 0.7, 0.1, 0.3, 0.8
empty empty empty
0.1
0.5 0.3
0.2 0.4 0.7 0.8
Also easy to implement in O(N log N) time.
Unfortunately, the generic bad cases for First Fit etc. apply
to Best Fit also. Best Fit never uses more than 1.7 times optimal.
Complicated analysis, omitted.
10. Offline Algorithms.
If we can view the entire sequence upfront, we should expect to
do better. With exhaustive enumeration, of course, we can find the
optimum. But even offline bin packing is not easy *if* we have only
a polynomial amount of time. (NP-Complete.)
A trouble with online algorithms is that packing large items is
difficult, especially if they occur late in the sequence. We can
circumvent this by *sorting* the input sequence, and placing the
large items first. With sorting, we get First Fit Decreasing and
Best Fit Decreasing, as offline analogs of online FF and BF.
With sorting, the input sequence becomes:
0.8, 0.7, 0.5, 0.4, 0.3, 0.2, 0.1
Applying First Fit Decreasing, we get an optimal.
0.1
0.2 0.3 0.4
0.8 0.7 0.5
Note that the bad cases that require 10M bins as opposed to 6M
also do not apply here. In fact, we show the following theorem.
THEOREM: First Fit Decreasing uses at most (4M + 1)/3 bins
if the optimal is M.
11. First Fit Decreasing.
The proof of FFD's performance depends on two technical observations.
1. Suppose the N items have been sorted in descending order of size;
s1 > s2 > ... > sN. If the optimal packing uses M bins, then
all bins in the FFD after M have items of size <= 1/3.
2. The number of items FFD puts in bins after M is at most M-1.
Proof of 1. By contradtiction. Suppose si is the first item to
be put in bin M+1, and si > 1/3. Therefore, we also have that
s1, s2, ..., si-1 > 1/3. From this, it follows that each of the
first M bins has at most 2 items each.
Claim. The state of FFD just before si was placed is the following:
the first few bins have exactly 1 item, remaining have 2 items.
If not, then there must be two bins Bx, By, with x < y, such
that Bx has two items x1, x2, and By has 1 item y1.
Since x1 was put in earlier bin, x1 >= y1.
Since x2 was put in before si, x2 >= si.
Thus, x1 + x2 >= y1 + si. But this implies that si could have
fit in By, which contradicts our assumption. Thus, if si > 1/3,
then the first M bins must be arranged so that first j have
1 item; the next M-j have two items.
To finish the proof, we now argue that there is no way put all the
items in M bins, contradicting the assumption of optimality.
No two items from s1, s2, ..., sj can be put in a single bin;
if so, FFD would have done it. Because FFD faild to put any
of the items s_{j+1}, ..., s_{i-1} into first j bins, in any
solution (including optimal), there must be j bins that do
not contain any item from s_{j+1}, ..., s_{i-1}. Thus, all these
items must be contained in the remaining M-j bins. Further,
there are 2(M-j) such items (because in FFD each of these
M-j bins had 2 items).
Now, if si > 1/3, then there is no way for si to be placed
in any of these M bins: it can't fit in the first j because
otherwise FFD would have done it; it can't go in the remaining
M-j because each of them already has two items of sizes > 1/3.
Thus, the optimal would require at least M+1 bins, which is
a contradiciton!
So, it must be that si <= 1/3.
Proof of 2.
Suppose that there are at least M objects put in the extra bins.
Since all items fit in M bins, we have sum_{i=1}^N si <= M.
Suppose bin j is filled with total weight Wj. Suppose the
first M extra objects have sizes x1, x2, ..., xM.
Because the items packed by FFD in first M bins plus the
first M extra are subset of total, we have
\sum_{i=1}^N si >= \sum_{j=1}^M Wj + \sum+{j=1}^M xj
>= \sum_{j=1}^M (Wj + xj)
Now, Wj + xj > 1, for each j; otherwise FFD would put xj in Bj.
Thus, \sum_{i=1}^N si > \sum_{j=1}^M 1 > M
But that's impossible because all si fit in M bins.
So, there must be only M-1 items in the extra bins.
Proof of Theorem.
There are M-1 extra items, each of size <= 1/3. Thus, there can
be at most (M-1)/3 extra bins. Thus, the total number of bins
needed by FFD is (4M+1)/3.
12. More complicated Theorem.
If M is the optimal number of bins, then FFD never uses
more than 11M/9 + 4 bins. There are sequences for which
FFD uses 11M/9 bins.
when