Sorting 1. One of the most basic and frequently used operations in computing. We begin by assuming that numbers to be sorted are integers; all the algorithms will work for general inputs too. These comparison based algorithm sort by only comparing numbers, not based on their absolute value. (Algorithms that use values include bucket sorting; we won't discuss them.) 2. Many simple O(n^2) algorithms exist. Insertion sort is a classic example. Shell sort is o(n^2), but runs very fast in practice; but its exact analysis difficult and still unresolved. Then many O(n log n) algorithms, which are theoretically optimal. 3. Insertion sort. One of the first algorithms. Suppose locations in the array are indexed 0 through n-1. Then, the algorithm makes n-1 passes; after i passes, the first i+1 numbers in the array get sorted. Strategy: In pass i, move ith item left to its proper place. Insertion Sort: for (i = 1; i < n; i++) tmp = a[i]; for (j = i; j > 0 && tmp < a[j-1]; j--) a[j] = a[j-1]; a[j] = tmp; 4. Example: (^ shows the pth element) original 34 8 64 51 32 21 ^ after i=1 8 34 64 51 32 21 ^ after i=2 8 34 64 51 32 21 ^ after i=3 8 34 51 64 32 21 ^ after i=4 8 32 34 51 64 21 ^ after i=5 8 21 32 34 51 64 5. Analysis: When pth item is moved, it can move at most p positions, and so the upper bound is (1 + 2 + 3 ... + n) = O(n^2). This is also a lower bound. 6. A LOWER BOUND: An inversion is any (ordered) pair (i,j) where i < j but a[i] > a[j]. For example, in the example above (34, 8) forms an inversion, as does (64, 32). Altogether there are 9 inversions in that list. The number of inversions in the sorted list is zero. Each insertion sort swap removes exactly one inversion. Thus, the worst-case number of swaps equals the worst-case number of inversions. A reverse-sorted list has n(n+1)/2 inversions. In fact, in a random list, the avg number of inversion is n(n-1)/4 --- for each of the pairs, the prob of inversion is 1/2. (Or, for any list L, also consider its reverse Lr; every pair is inverted in either L or Lr.) But this does not imply that EVERY sorting algorithm must take n^2 time---one can swap any two elements in the array, which may fix many inversions in one shot. 7. Shell Sort. Invented by Don Shell, the first algorithm to break the n^2 barrier; though its analysis done much later. It's like insertion sort, but swaps non-adjacent elements. Strategy. Pick a increment sequence: h1 < h2 < h3 ... < ht, where h1 = 1. After phase k, a[i] <= a[i+ hk], for all i. (Notice in insertion sort, hi = 1, for all i. 8. Shell Sort: for (k = t; k > 0; t--) // gap hk .. downto h1 for (i = hk ; i < n; i++) tmp = a[i]; for (j = i; j >= hk && tmp < a[j - hk]; j -= hk) a[j] = a[j - hk]; a[j] = tmp; So, after k executions of outer loop, a[j] <= a[j + hk]; in the end, h1 = 1, and so a[j] <= a[j+1] for all j. Done. Many choices for the increment sequence. One example: n/2, n/4, ..., 1. This will give logn rounds. 8. Shellsort Example: index 0 1 2 3 4 5 6 7 8 9 10 11 12 81 94 11 96 12 35 17 95 28 58 41 75 15 aft 5 35 17 11 28 12 41 75 15 96 58 81 94 95 aft 3 28 12 11 35 15 41 58 17 94 75 81 96 95 aft 1 11 12 15 17 28 35 41 58 75 81 94 95 96 In first pass, a[5] = 35, swaps with a[0] = 81; a[6] = 17] swaps with a[1] = 94; etc. In second pass, a[3] = 28 swaps with a[0] = 35; etc.. In each pass, we actually run Insertion Sort, so e.g. in third pass, when j=1, 12 swaps with 28. When j=2, 11 uses insertion sort to find its proper place at position 0. 9. Analysis: Worst-case time for Shellsort is O(n^2). (See textbook: when increment is hk, the cost is roughly hk insertion sorts, each on set of size N/hk.) Worst-case time using Hibbard's increment is O(n^1.5). Hibbard's increment: 1, 3, 7, ..., 2^k -1. (relatively prime) Sedgewick's increments give O(n^4/3) upper bound. The best known seq: {1, 5, 19, 41, 109, ....} Terms are either 9(4^i) - 9(2^i) + 1, or 4^i - 3(2^i) + 1. Shellsort used in practice because it's very simple to code, and performance is quite fast; often better than some O(n log n) algorithms. 10. Heapsort. Build a heap on the n elements. O(n) time. Do n DeleteMins. O(n log n) total. The sequence output from the deleteMins is sorted. One drawback is that you need double the memory---one for the array, one for the output. In contrast, Shellsort or insertion sort are "in place"---one array. However a minor trick make heapsort work with one array. When a deleteMin is performed, the key from the last leaf (last array cell) is moved to the root and percolatedDown. Thus, we can put the element just pulled from the root in that last cell. To output numbers in ascending order, use a maxHeap. 11. Example. Heap in array: 97 53 59 26 41 58 31 After deleteMin, the array is rearranged to 59 53 58 26 41 31 - Put 97 in that empty cell and continue. Next deletemax will go where 31 is. Analysis: The worst-case time for heapSort is about 2nlogn; the ith deleteMin requires roughly 2log (n-i) comparisons; node against its two children. HeapSort is pretty steady---its best and worst times are about the same. In practice, shellsort with some specific increment sequences does better than heapSort, even though the latter has better worst-case performance. 12. Merge Sort. A classic example of divide-and-conquer. Divide the array into left and right halves. Recursively sort the two halves; when the array reaches a small size (say, 2 or 4 elements, use insertion sort). Having sorted the left and right halves, we then MERGE these to form the final sorted sequence. That is, A -> A_l and A_r -> Sorted(A_l) and Sorted (A_r) -> Merged The key, and only non-trivial operation is to merge to two sorted list. We assume that a new array is used for the output. Put ptrs at first elements of L and R arrays. Copy the smaller element to the output array C, and advance the ptr of the smaller element. Repeat until one ptr reaches the end, then copy the second array over. 13. Merge Example: 1 13 24 26 + 2 15 27 38 = 1 2 13 15 24 26 27 38 Time to merge two lists is LINEAR in their total size; we always advance one ptr. 14. Analysis The runtime of merge sort is best analyzed through recursion: T(n) = 1 if n = 1 T(n) = 2 T (n/2) + n otherwise. Solve the recurrence by unraveling it: T(n) = 2 T (n/2) + n = 2 (2 T(n/4) + n/2) + n = 2^2 T(n/2^2) + 2n .... = 2^i T(n/2^i) + in When i = logn, n/2^i = 1, so T(n) = 2^log n + nlogn = nlogn + n. While theoretically optimal, it's rarely used in practice; in part because of the extra memory needed in merging, and in part because easier-to-code algorithms with same or better performance exist (e.g. quicksort). 15. Quicksort. One of the fastest sorting algorithms in practice. It's expected running time is O(n log n), though the worst-case is O(n^2); however, the worst-case can be made exponentially unlikely with little extra effort. Its practical performance is due to a very tight and optimized inner loop. General Strategy: 1. if |S| = 0 or 1, return 2. Pick any element v in S, call it pivot. 3. Partition S - v (remaining elements of S) into two disjoint groups S1 (elem <= v) and S2 (elem >= v). 4. Return (quicksort(S1) + v + quicksort(S2)) Example: S = {13, 81, 65, 92, 43, 31, 57, 26, 75, 0} Pick v = 65 S1 = {13, 0, 43, 31, 26, 57} and S2 = {81, 92, 75} Output = (0, 13, 26, 31, 43, 57, 65, 75, 81, 92) 16. Looks a lot like mergesort. While mergeSort always divides into equalsized subarrays, quicksort subarray sizes are determined by the pivot. A bad choice of pivot can create unbalanced subproblems, which is quite bad. The reason for quicksort's speed is that (1) there are good strategies for choosing pivots, and (2) the PARTITION step can be done in place and really fast. 17. Pivot Choices: First or last array element. Good if array is random, but BAD if array is partly sorted. Not recommended. Random pivot. Generally works very well. recommended. Median of 3: Pick 3 random elements and choose their median. Or, median of left, right, and middle. E.g. in array (8, 1, 4, 9, 6, 3, 5, 2, 7, 0), left=8, right=0, center=6, so the median is 6. 18. Partition. 1. Swap the pivot with the last array item. 2. set i=0, and j= n-2. 3. while i < j do move i right as long as element is <= pivot move j left as long as element is >= pivot // now i is pointing at element > pivot and // j is pointing at element < pivot swap these elements 19. Example: after pivot and last element swap, 8 1 4 9 0 3 5 2 7 6 i j advance i,j i j 2 1 4 9 0 3 5 8 7 6 swapped i j advance i,j 2 1 4 5 0 3 9 8 7 6 swap and advance j i now j < i; stop Now swap pivot element with the element at position i; 2 1 4 5 0 3 6 8 7 9 done. Here we assumed that all elements were distinct. What if there are elements equal to the pivot? One suggested solution it to stop both i and j when pivot element encounted, swap and continue. 20. Analysis of QuickSort. Analyze using random pivot. Suppose the pivot splits S into two subarrrays of sizes i and n-i-1. Then, the running time can be expressed as: T(n) = T(i) + T(n-i-1) + cn (The cn is the time for partition, and choosing pivot.) Worst-Case Analysis. Pivot is the smallest (or largest element). T(n) = T(0) + T(n-1) + cn by iterating, we get T(n-1) = T(n-2) + c(n-1) T(n-2) = T(n-3) + c(n-2) .... T(2) = T(1) + c.2 Thus, T(n) = T(1) + c \sum_{i=2}^n (i) = O(n^2). Best Case Analysis. Pivot evenly splits the array every time during the recursion. T(n) = 2 T(n/2) + cn same as mergesort, so T(n) = O(n log n). Another method: T(n) = 2T(n/2) + cn Divide both sides by n; we get T(n)/n = T(n/2)/(n/2) + c; now reapply T(n/2)/(n/2) = T(n/4)/(n/4) + c; T(n/4)/(n/4) = T(n/8)/(n/8) + c; .... T(2)/2 = T(1)/1 + c; there are log n of these Add up all the equations; the sum telescopes; all intermediate terms cancel out: T(n)/n = T(1)/1 + c logn ==> T(n) = cnlogn + n = O(n log n). 21. Expected Case (average). This is more complicated to analyze. Since the pivot is chosen randomly, the size of the S1 subproblem can be i, for i = 0,1,...,n-1, with equal probability. The size of S2 is the complement, also with equal prob. Thus, the average value of T(i) (and also of T(n-i-1)) is 1/n (T(0) + T(1) + ... + T(n-1)). Thus, our recurrence becomes: T(n) = 2/n [\sum_{j=0}^n-1 (T(j) ] + cn By multiplying with n, we get: a. n T(n) = 2 [\sum_{j=0}^n-1 (T(j) ] + cn^2 b. (n-1) T(n-1) = 2 [\sum_{j=0}^n-2 (T(j) ] + c(n-1)^2 //for n-1 Subtract b from a, we get nT(n) - (n-1)T(n-1) = 2T(n-1) + 2cn - c Rearrange and drop the insignificant -c, we get n T(n) = (n+1) T(n-1) + 2cn Divide by n(n+1) T(n)/n+1 = T(n-1)/n + 2c/(n+1) T(n-1)/n = T(n-2)/(n-1) + 2c/n T(n-2)/(n-1) = T(n-3)/(n-2) + 2c/(n-1) ..... T(2)/3 = T(1)/2 + 2c/3 Adding, and telescoping, we get T(n)/(n+1) = T(1)/2 + 2c \sum_{i=3}^n+1 (1/i) the last sum is approx log n Thus, T(n) = O(n log n). 22. Lower Bound for Sorting We have improvd sorting algorithms from O(n^2) to O(n log n). Can we go one step further and achieve O(n)? In other words, can an algorithm guarantee to sort any possible sequence with only O(n) comparisons? We will prove this is not possible! Decision Trees. A decision tree is an abstraction used for proving lower bounds. It captures the essense of the state of a program (algorithm), where comparisons are the key operations. In this abstraction, we assume that a program can do all other operations (add, subtract, access, etc) for FREE; the only charge is for comparing two numbers. We will show that even in this idealized model, sorting cannot be accomplished with fewer than nlogn comparisons in the WORST CASE; in fact, even average. 23. Every comparison-based algorithm's executions can be drawn as a decision tree. Each execution corresponds to a path, starting at the root and ending at a leaf. (1) a < b a > b (2) (3) a < c a > c b < c b > c (4) (5) (6) (7) [c,a,b] [c,b,a] b < c b > c a < c c < a (8) (9) (10) (11) [a,b,c] [a,c,b] [b,a,c] [b,c,a] Fig shows a decision tree corresponding to an algorithm that sorts 2 elements, a,b,c. The root corresponding to the state before algorithm begins---no information about the ordering yet. The first comparison by the algorithm is between a and b. There are two outcomes: a < b, and a > b. (Assume distinct elements.) Based on the outcome, the algorithm braches to the left or rigth; On left, it next compares b and c. (Other algorithms may do some other comparison, and they will have a different decision tree.) And so on. A leaf corresponds to a state where the algorithm has figured out the sorted order, based on its comparisons. We have 6 leaves. (It's possible to have more leaves; multiple leaves may correspond to the same ordering.) The worst-case running time of an algorithm is the DEPTH of the decision tree---the longest path. In this case the depth is 3; the algorithm needs 3 comparisons when the ordering is [a,b,c], or [a,c,b], or... In some cases, only 2 comparisons suffice. But we want the worst-case! 24. To prove the lower bound, we will show that any sorting algorithm's decision tree has at least one long path! Fact 1. A binary tree of depth d at most 2^d leaves. Pf. By induction. The children of root have depth d-1, and 2^{d-1} leaves. Fact 2. A binary tree with L leaves must have depth at least [log L] Pf. Apply the previous lemma. Theorem. Any sorting algorithm that uses only comparison between elements requires at least [log (n!)] = Omega(n log n) comparisons in the worst-case. Pf. Look at the decision tree for this algorithm. It must have n! different leaves, one for each permutation of the n elements. No two permutations can lead to the same leaf---otherwise, the alg is wrong on at least one of them. Thus, by previous lemma, the depth is at least [log (n!)]. There are several ways to argue that this bound is Omega(n log n). First, log(n!) = log (n*(n-1)*(n-2)*...2*1) = logn + log (n-1) + log (n-2) + ... log 2 >= n/2 (log n/2) = n/2 log n - n = Omega(n log n). Second, use Stirling's formula: n! = \sqrt{2 pi n} (n/e)^n log (n!) = log (n/e)^n + log (2 pi n) >= log n^2 - log e^n = n log n - n = Omega (n log n) 25. A Non-Comparison-Based Sorting Algorithm. Suppose we have n integers, in the range 1 to m, where m = O(n). Then initialize an array C of size m, with all zeros. Read the input array of numbers to be sorted, and if the number is i, increment C[i]. Finally, go over the array C, and print out C[i] copies of i. The algorithm take O(n + m) = O(n) time. How does it avoid the decision tree lower bound? (By directly putting in the location c[i], in some sense we are performing a m-way comparison..) Algorithms like bucket sort are useful in specialized applications, where elements are known to be from a small universe.