Administrative Details. ----------------------- The course web page is: www.cs.ucsb.edu/~suri/cs130b/cs130b.html My office hours: TBA. TA: Ceren Budak (Email: cbudak@cs.ucsb.edu) TA hours: Grading: Divided into 4 parts: Practice Problems: 20 % Quizzes: 20 % Programming assignments: 30 % Exams: 30 % ** AWAY on April 22-24, and May 8. ** What the course is all about: 1. Theme: Technqiues for designing efficient algorithms Inherent complexity/hardness of problems. What is the best algorithm for a given problem. Three things you will learn: Design a good algorithm, Analyze it, Know when to stop (lower bounds). The word ``algorithm'' derived from Mohammed Al-Khowarizmi, 9th century Persian mathematician. He formalized pencil-paper methods for arithmetic. Algorithms can be thought of as recipes---detailed and precise, where elegance, efficiency, simplicity etc matter. 2. Some Famous Algorithms Constructions of Euclid, Newton's root finding, Fast Fourier Transform, Data Compression (Huffman}, Lempel-Ziv, GIF, MPEG), Data encryption (DES, RSA), Simplex algorithm for linear programming, Shortest Path Algorithms (Dijkstra}, Bellman-Ford), Dynamic programming, Error correcting codes (CDs, DVDs), TCP congestion control, IP routing, Pattern matching (Genomics), Delaunay Triangulation (FEM, Simulation), Some times key algorithms are invisible to outside world: components in larger systems, such as design automation (layout or wire routing in circuit boards, chips), code optimization, clustering in document retrieval or data mining, etc. At other times, algorithms are the defining character of a system or service: search engines, combinatorial auctions, IP routing, encryption etc. 3. Topics Covered: Models, Asymptotic notation, Worst-case analysis of algorithms, Recurrences, Greedy paradigm (Activity Selection, Huffman Coding, Shortest Paths (Dijkstra), Min Spanning Trees (Kruskal, Prim)) Divide and conquer (Multiplying Long Numbers, Matrix Multiplicatio, Quicksort Algorithm, Selection, Convex Hulls). Dynamic Programming (Matrix Chain Product, Longest Common Subsequence, Knapsack). NP-Completeness. Approximation Algorithms Lower Bounds. 4. Why bother about investing in algorithms? Computers are so fast, why bother with algorithms. TSP: Shortest route to visit N cities. An important optim. problem. For N = 100, we have 100! = 30^{100} possible tours A supercomputer checking 100 billion tours per sec only does 10^{20} tours in 1 year! Still needs > 10^{100} years. Factoring: Fast number factoring algorithms can break encryption schemes. Straightforward method need millions of years to factor a 300-digit number. State of the art algorithm can break 100-digit numbers in few days using network-of-workstations. Algorithms research determines what's considered safe code lengths. 5. Tables of running times. 1 century = 3.1556926 × 10^9 seconds pi seconds = 1 nano century. TABLES. Asymptotic estimates are good enough to make a high level distinction between practical and impractical algorithms. More refined analysis and/or simulations are used to make these judgements among multiple competing "theoretically equivalent" algorithms. 6. Review Asymptotic Notation; basic summations, and elementary recurrences. (Chapter 2) We will do more systematic study of recurrences later in the course. 7. Illustration of Algorithm Design Process: Closest Pair of Points in 2D Given a set of N points in 2D, determine the closest pair by Eucidean distance. Euclidean distance between points pi = (xi, yi), and pj = (xj, yj) is \sqrt[ (xi - xj)^2 + (yi - yj)^2 ] Fundamental problems in many applications: cluster formations, graphics and robotics (collision detection), statistics etc. Naive Algorithm: minDist = 0; for (i = 1; i <= N; i++) for (j = 1; j <= N; j++) if ( !(i == j) && minDist > d(pi, pj) ) then minDist = d(pi, pj); return minDist; Analysis: Correctness: obvious Running time: Theta(N^2). 8. An improvement: minDist = 0; for (i = 1; i <= N; i++) for (j = i+1; j <= N; j++) if ( !(i == j) && minDist > d(pi, pj) ) then minDist = d(pi, pj); return minDist; Correctness: again obvious Running time: Still Theta (N^2) This will not scale to 10^9 points, for example, needed for large scale simulations. Can we beat the N^2 bound??? Fundamentally, how can we avoid looking at all pairs? After all, any pair can be the closest... Ideas!!! 9. The points are input in arbitrary order. We can certainly sort them in O(N log N) time. But using WHAT KEY? We could try sorting by X coordinates. Must it be the case that the closest pair of points are ADJACENT in the sorted order?? We could also sort by Y, and check adjacencies... Or, we could sort by X+Y and check adjacencies... Can it be that the closest pair is NOT adjacent in any of these sorted orders? COUNTER EXAMPLE: (There must be some direction in which the closest pair is adjacent, but determining that direction may be even harder...) 10. Try Divide and Conquer (an algorithmic paradigm) Consider an imaginary vertical line that divides the points into two equal-sized halves. Suppose (p*, q*) is the closest pair. There are 3 possibilities: p*, q* both in left half p*, q* both in right half p* and q* in different halves. In the first two cases, we could solve the problem by making a recursive call. The third part is where the meat (and bulk of computation) is. Suppose the closest pair in left half has distance h1, and the closest pair in right half has distance h2. Let h = min (h1, h2). INSIGHT 1: If we are in case 3, then p*, q* must both be inside the h-wide strip around the vertical cutting line. WHY? Now, optimistically, we would like to believe that the size of the problem that needs to be solved in this h-strip is quite small. But, in the worst-case, it can be that all of the input lies in this strip, and we have saved nothing!!! 11. But do not despair! We have gained a lot. With a little more thought, structure reveals itself in the h-strip problem. INSIGHT 2: Consider a point q in the right half of the strip. If q is to form a CP pair with some point p in the left half, then p must lie inside the hx2h rectangle flushed at q. Why? INSIGHT 3: How many such candidate points p can be inside one such box? At most 6!! So, here is how we solve the h-strip problem. Project all the points of h-strip onto the vertical axis: i.e. ignore their x-coordinates. Traverse this list in sorted order from bottom to top: for each point q in the list, check its (true 2D) distance to any other point in the list that is at (y-coord) distance h away. Find the smallest of these distances. Call it h3. The closest pair is either at h or h3, whichever is smaller. Correctness: Follows from the INSIGHTS 1-3. Time complexity: O(n log n) for sorting initially PLUS T(n) = 2T(n/2) + O(n). This recurrence solves to O(n log n). So, we can solve the closest pair problem in total time O(n log n).