Administrative Details. ----------------------- The course web page is: www.cs.ucsb.edu/~suri/cs130b/cs130b.html My office hours: TBA. TA: Pegah Kamousi (Email: pegah@cs.ucsb.edu) TA hours for Pegah: Textbook: Data Structures and Algorithms Analysis (in C++, or any other edition) Mark Allen Weiss A significant portion of the lecture material, however, will be my own notes, not from the text. A plain ascii version of my notes available from the web page (no figures etc). Other recommended books (not required): Introduction to Algorithms, Cormen, Leiserson, Rivest, Stein Grading: Divided into 3 parts: Homework assignments: 30% Programming assignment: 20% Quizzes: 10% 2 Exams: 40% Homework assignments and Exams will not have any programming in them. They will be theoretical: concerned with design and analysis of algorithms. I will be away on Jan 29. *** Midterm 1, proctored by the TA. *** ETHICS POLICY: I expect everyone to do, write their own homeworks and programming assignment. You can discuss problems with each other BUT WHEN IT COMES TO WRITING THE SOLUTIONS, you should do that on your own. A litmus test: you should be able to come explain the solutions to me in my office. Remember: written homeworks count for only 30%. Half of the grade is determined by the EXAMS. If you didn't understand the material yourself, it will hurt you. What the course is all about: 1. Theme: Technqiues for designing efficient algorithms Inherent complexity/hardness of problems. 1a. Introduction Algorithms and algorithmic thinking are pervasive. In fact they go well beyond just computer science: Increasingly, you hear about algorithmic and computational biology; none of the recent breakthroughs in genetic decoding, synthetic drug design etc would have possible without algorithms. There is even emerging relationship with economics. Computational thinking has become important to economists as mechanisms like auctions are being adopted for allocation of resources such as wireless spectrum, goods and services, key word ads on web pages etc. At the same time, much of computer science can be thought of at its core a problem of "resource allocation"---computation, memory, communication, cache etc. As systems grow to global scale, these interactions sound a lot like an "economy" and therefore many classical ideas from economics are finding a key role in CS: game theory, mechnism design etc. Finally, of course, there is the Internet, which has raised algorithmic thinking to a new level by exposing problems that can only be managed through computational means. Whether we want our web connections to work seamlessly, or the search engines to find the most relevant documents, or participate in virtual social groups, or play massive online games, the speed and scale at which information must be exchanged, managed, and processed is a nothing short of a miracle, and we often forget those silent algorithms working behind the internet cloud that make this possible. 1b. Algorithmic Enterprise Algorithmic problems form the core of computer science, but they *rarely* arrive as cleanly packaged, mathematically precise questions. Rather real-world problems almost always come bundled with a lot of messy, confusing, application-specific detail, some of which essential, much of it extraneous and distracting that only clouds the mind. As a result, the algorithmic enterprise consists of two fundamental, and equally important, parts: o. extracting the mathematically clean core of the problem, and o. designing an appropriate algorithm to solve the core problem. The skills needed are interrelated: the more comfortable one is with algorithmic design techniques, the easier it is to recognize the clean abstraction and formulations that lie within the messy problems out in the world; and the more adept one is at sniffing out the right mathematical formulation, the easier it is often to make progress on the algorithm design question. 1c. Course goals: What is the best algorithm for a given problem. Three things you will learn: Design a good algorithm, Analyze it, Know when to stop (lower bounds). The word ``algorithm'' derived from Mohammed Al-Khowarizmi, 9th century Persian mathematician. He formalized pencil-paper methods for arithmetic. Algorithms can be thought of as recipes---detailed and precise, where elegance, efficiency, simplicity etc matter. 2. Some Famous Algorithms Constructions of Euclid, Newton's root finding, Fast Fourier Transform, Data Compression (Huffman}, Lempel-Ziv, GIF, MPEG), Data encryption (DES, RSA), Simplex algorithm for linear programming, Shortest Path Algorithms (Dijkstra}, Bellman-Ford), Dynamic programming, Error correcting codes (CDs, DVDs), TCP congestion control, IP routing, Pattern matching (Genomics), Delaunay Triangulation (FEM, Simulation), 3. Topics Covered: Models, Asymptotic notation, Worst-case analysis of algorithms, Recurrences, Greedy paradigm (Activity Selection, Huffman Coding, Shortest Paths (Dijkstra), Min Spanning Trees (Kruskal, Prim)) Divide and conquer (Multiplying Long Numbers, Matrix Multiplicatio, Quicksort Algorithm, Selection, Convex Hulls). Dynamic Programming (Matrix Chain Product, Longest Common Subsequence, Knapsack). NP-Completeness. Approximation Algorithms Lower Bounds. 4. Why bother about investing in algorithms? Computers are so fast, why bother with algorithms. TSP: Shortest route to visit N cities. An important optim. problem. For N = 100, we have 100! = 30^{100} possible tours A supercomputer checking 100 billion tours per sec only does 10^{20} tours in 1 year! Still needs > 10^{100} years. Factoring: Fast number factoring algorithms can break encryption schemes. Straightforward method need millions of years to factor a 300-digit number. State of the art algorithm can break 100-digit numbers in few days using network-of-workstations. Algorithms research determines what's considered safe code lengths. 5. Tables of running times. TABLES. Asymptotic estimates are good enough to make a high level distinction between practical and impractical algorithms. More refined analysis and/or simulations are used to make these judgements among multiple competing "theoretically equivalent" algorithms. 6. Review Asymptotic Notation; basic summations, and elementary recurrences. (Chapter 2) We will do more systematic study of recurrences later in the course. 7. Illustration of Algorithm Design Process: Closest Pair of Points in 2D Given a set of N points in 2D, determine the closest pair by Eucidean distance. Euclidean distance between points pi = (xi, yi), and pj = (xj, yj) is \sqrt[ (xi - xj)^2 + (yi - yj)^2 ] Fundamental problems in many applications: cluster formations, graphics and robotics (collision detection), statistics etc. Naive Algorithm: minDist = 0; for (i = 1; i <= N; i++) for (j = 1; j <= N; j++) if ( !(i == j) && minDist > d(pi, pj) ) then minDist = d(pi, pj); return minDist; Analysis: Correctness: obvious Running time: Theta(N^2). 8. An improvement: minDist = 0; for (i = 1; i <= N; i++) for (j = i+1; j <= N; j++) if ( !(i == j) && minDist > d(pi, pj) ) then minDist = d(pi, pj); return minDist; Correctness: again obvious Running time: Still Theta (N^2) Can we beat the N^2 bound??? Fundamentally, how can we avoid looking at all pairs? After all, any pair can be the closest... Ideas!!! 9. The points are input in arbitrary order. We can certainly sort them in O(N log N) time. But using WHAT KEY? We could try sorting by X coordinates. Must it be the case that the closest pair of points are ADJACENT in the sorted order?? We could also sort by Y, and check adjacencies... Or, we could sort by X+Y and check adjacencies... Can it be that the closest pair is NOT adjacent in any of these sorted orders? COUNTER EXAMPLE: (There must be some direction in which the closest pair is adjacent, but determining that direction may be even harder...) 10. Try Divide and Conquer (an algorithmic paradigm) Consider an imaginary vertical line that divides the points into two equal-sized halves. Suppose (p*, q*) is the closest pair. There are 3 possibilities: p*, q* both in left half p*, q* both in right half p* and q* in different halves. In the first two cases, we could solve the problem by making a recursive call. The third part is where the meat (and bulk of computation) is. Suppose the closest pair in left half has distance h1, and the closest pair in right half has distance h2. Let h = min (h1, h2). INSIGHT 1: If we are in case 3, then p*, q* must both be inside the h-wide strip around the vertical cutting line. WHY? Now, optimistically, we would like to believe that the size of the problem that needs to be solved in this h-strip is quite small. But, in the worst-case, it can be that all of the input lies in this strip, and we have saved nothing!!! 11. But do not despair! We have gained a lot. With a little more thought, structure reveals itself in the h-strip problem. INSIGHT 2: Consider a point q in the right half of the strip. If q is to form a CP pair with some point p in the left half, then p must lie inside the hx2h rectangle flushed at q. Why? INSIGHT 3: How many such candidate points p can be inside one such box? At most 6!! So, here is how we solve the h-strip problem. Project all the points of h-strip onto the vertical axis: i.e. ignore their x-coordinates. Traverse this list in sorted order from bottom to top: for each point q in the list, check its (true 2D) distance to any other point in the list that is at (y-coord) distance h away. Find the smallest of these distances. Call it h3. The closest pair is either at h or h3, whichever is smaller. Correctness: Follows from the INSIGHTS 1-3. Time complexity: O(n log n) for sorting initially PLUS T(n) = 2T(n/2) + O(n). This recurrence solves to O(n log n). So, we can solve the closest pair problem in total time O(n log n).