CS 290N: Parallel Computer Architectures


Final Project Reports due Friday 12/15

Time: WF 1-2:50
Room: Phelps 1401
Instructor: Prof. Fred Chong; office hours by appointment; Eng I 5163

As the computing industry struggles to meet performance expectations set by Moore's Law, many-core architectures have become the wave of the future. This course will examine the foundations of parallel architecture and the basic mechanisms to support efficient parallel computation. Topics may include: multiprocessor synchronization, speculative locks, transactional memory, interconnect design, application partitioning, race detection, and reliability issues.


  • Problem Sets 25%
  • Discussion Topics 20%
  • Project Proposals and Drafts 15%
  • Project Final Report 40%

    Discussion Information

  • Assign a paper for the class to read, one week before your discussion day.
  • Present the paper and supplemental material on your assigned day. Lead discussion, with my help, on the subject.

    Problem Set Information

    For each assigned paper, write up the following and hand in hardcopy to me at the beginning of class:
  • A summary of the main points of the paper.
  • A critique of any shortcomings of the paper.
  • Any ideas on how one would extend the ideas in the paper or address its shortcomings.

    Project Information

    Here is an example project paper. The project has two goals:
  • A critique of three related research papers. This is not a book report. Do not just summarize what is in the papers. Point out shortcomings and possible areas for extension.
  • Extension of the area. Address shortcomings or extend the work in the papers. Come up with some ideas and test them with a short project. This can be in the form of some simple analysis, study of application attributes, small machine simulations, or implementation on parallel machines. Remember to pick something that will fit in a quarter.

    Ideally, both goals would be well-addressed in a project. Since we only have a quarter, however, you may emphasize one or the other.


  • Lecture 1 (10/3/07): Introduction and Organizational Meeting

    Reading for next time: How to get good performance on the CM5 Data Network [Brewer and Kuszmaul 94].

    Additional References (optional): The CM5 Data Network [Leiserson et al 95].

  • Lecture 2 (10/5/07): Interconnect

  • Lecture 3 (10/10/07): The CM5 and Programming for Network Performance

    Reading for next time: Active Messages [von Eicken et al 92].

  • Lecture 4 (10/12/07): Active Messages, Polling, and Interrupts

    Reading for next time: Reactive Synchronization Algorithms [Lim and Agarwal 94].

    Optional reading for next time (no writeup required) :

    "Evaluating MapReduce for Multi-core and Multiprocessor Systems," Ranger et al. (presented by Tierui Chen)

    "Comparing Memory Systems for Chip Multiprocessors", Jacob Leverich et al. (presented by Susmit Biswas)

  • Lecture 5 (10/17/07): Synchronization

    Reading for next time: SafetyNet: Improving the Availability of Shared Memory Multiprocessors with Global Checkpoint/Recovery [Sorin et al 02].

    Optional reading for next time (no writeup required) :

    Piranha: A Scalable Architecture Based on Single-Chip Multiprocessing [Barroso et al 00]. (presented by Ayswarya Sundaram)

    "Fast Checkpoint/Recovery to Support Kilo-Instruction Speculation and Hardware Fault Tolerance," Sorin et al (presented by Alan Savage).

  • Lecture 6 (10/19/07): Shared Memory Protocols and System Reliability

    Reading for next time: "Heterogeneous Chip Multiprocessors," Kumar et al (presented by Shriram Rajagopalan)

    Optional reading for next time (no writeup required) :

    Memory consistency tutorial (presented by Mohit Tiwari)

  • Lecture 7 (10/24/07): Cache Consistency and Heterogeneous Multiprocessors

    Reading for next time:
    MIT RAW (presented by Christo Wilson)

    Optional reading for next time (no writeup required) : SIMD DSP Compiler (presented by Taylor Ettema)

  • Lecture 8 (10/26/07): Tile processors and SIMD

    Reading for next time: The Impact of Performance Asymmetry in Emerging Multicore Architectures (presented by Pavan Kumar Thirunagari)

    Optional reading for next time (no writeup required) : On the Design and Analysis of Irregular Algorithms on the Cell Processor: A Case Study of List Ranking (presented by Vikramjeet Singh Sehmi)

  • Lecture 9 (10/31/07): Asymmetric Processors

    Reading for next time:
    Smart Memories (presented by Vlasia Anagnostopoulou)

    Optional reading for next time (no writeup required) : Berkeley Parallel Computing Report (presented by Chris Grzegorczyk)

  • Lecture 10 (11/2/07): Reconfigurable tiles and parallel computing

    Reading for next time:
    Scientific Applications on Cell (presented by Nelson Ijih)

    Optional reading for next time (no writeup required) : Optimizing Compiler for the Cell (presented by Chris Bunch)

  • No Class (11/7/07)

  • Lecture 11 (11/9/07): The Cell processor

    Reading for next time:
    Data prefetch mechanisms (presented by Varun Radhakrishnan)

    Optional reading for next time (no writeup required) :
    Hydra ISCA03
    Hydra Micro 03 (presented by Hassan Wassel)

  • Lecture 12 (11/14/07): Data prefetching and Transactions

    Reading for next time:
    Transactional Memory Cache Coherence and Consistency (presented by Shravan Samindla)

    Optional reading for next time (no writeup required) : An effective Hybrid transactional memory system with strong isolation guarantees (presented by Nagender R Paduru)

  • Lecture 13 (11/16/07): Transactions

    Reading for next time:
    802.11a implementation (presented by Shravan Mettu)

    Optional reading for next time (no writeup required) : Architecture for Software Radio(presented by Ramya Raghavendra)

  • No Class (11/21/07)

  • Lecture 14 (11/28/07): Wireless Applications

    Reading for next time:
    Increasing power efficiency of multi-core network processors through data filtering (presented by Amit Jardosh)

    Optional reading for next time (no writeup required) :

    Paravirtualization for HPC
    Evaluating the Performance Impact of Xen on MPI and Process Execution for HPC Systems
    (presented by Lamia Youseff)

    (presented by Arda Atah)

  • Lecture 15 (11/30/07): Network Processors and VMs

    Reading for next time:
    GPU Cluster (presented by Liubov Kovaleva)

    Optional reading for next time (no writeup required) : Using Modern Graphics Architectures for General-Purpose Computing(presented by Ceren Budak)

  • Lecture 16 (12/5/07): GPU Clusters and GPGPU

    Reading for next time:
    Scan Primitives for GPU Computing (presented by Aydin Buluc)

    Optional reading for next time (no writeup required) : GPGPU Survey (presented by Fenglin Liao)

  • Lecture 17 (12/7/07): GPGPU

    Last updated November 2007