Lecture and Paper Schedule

(Note: Lecture notes and slides will be posted on the Piazza group).

Please use the Piazza group to claim papers.

 Tu: 1/7
 Introduction and Projects

 Th: 1/9
 MapReduce, Google File System
 DG04, GGL03
 Tu: 1/14
 Algorithms on MapReduce
  Di Ma
 Th: 1/16
 Graphs on MapReduce
C09, KTF09
  Arvind, Lin Zhou
 Tu: 1/21
 Alternatives to MR on Graphs
  Qingyun Liu
 Th: 1/23
LGK+12, KBG12
  Daniel Kudrow, Mai ElSherief
 Tu: 1/28
 Dryad IBY07, YIF+08   Dibyendu Nath, Victor Zakhary
 Th: 1/30
 MR++: FlumeJava and MapReduce Online
CRP+10, CCA+10   Bolun Wang, Michael Nekrasov
 Tu: 2/4
 MR++: Mesos, Dolly
HKZ+11, AGS+13
  Erdinc Korpeoglu, Yibo Zhu
 Th: 2/6
 Higher Level Languages, Pig/Hive 
ORS+08, TSJ+10
  Agnethe Soeraa, Michael Agun
 Tu: 2/11
 Mid-quarter Project Status presentations  
 Th: 2/13
 NoSQL Systems, Part I CDG+06, DHJ+07
  Tianyi Wang, Ana Nika
 Tu: 2/18
 NoSQL Systems, Part II,  NoSQL vs DBs D08, DG10, SAD+10   Maxwell Hinson, Yanglei Li, Asad Ismail
 Th: 2/20
 Data Privacy and Anonymity BDK07, SZW+11   Divya Sambasivan, Ruoyu Wang
 Tu: 2/25
 Data Mining and Security Threats ZZT05, HAH+13
  Xiaohan Zhao, Gang Wang
 Th: 2/27
 Machine Learning I
GKP+11, CKL+06
  Sindre Haneset Nygaard, Karthik Puthraya
 Tu: 3/4
 Machine Learning II
ZCD+12, MQ09
  Nevena Golubovic, Nitharshaan Thevarajah
 Th: 3/6
 Computational Biology  T10, S09
  Morgan Virgil, Zihao Song
 Tu: 3/11
 Computational Biology II (Optional: QEG+13) MBS11
  Rolf Erik Heggem Lekang
 Th: 3/13
 Random Topics/All Questions Answered    None
 Wed: 3/19
 Final Project Presentations, 12-3pm

Paper List (Still under construction...)

Effective Straggler Mitigation: Attack of the Clones, G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica, NSDI 2013. PDF
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, Lars Backstrom, Cynthia Dwork, Jon Kleinberg, WWW'07, PDF
Graph Twiddling in a MapReduce World, J. Cohen, Computing in Science & engineering, 2009. PDF
MapReduce Online, T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, R. Sears, NSDI 2010, PDF
Bigtable: A Distributed Storage System for Structured Data, F. Chang et al, OSDI 2006. PDF
Map-Reduce for Machine Learning on Multicore, C. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun, NIPS 2006. PDF
FlumeJava: easy, efficient data-parallel pipelines, C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, N. Weizenbaum. PLDI 2010. PDF
MapReduce: A major step backwards, David DeWitt and Michael Stonebraker, Vertica Blog entry, January 2008. HTML
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, OSDI'04, PDF
MapReduce: A flexible Data Processing Tool, J. Dean, S. Ghemawat, CACM, 2010. PDF
Dynamo: Amazon's Highly Available Key-value Store, G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall and W. Vogels, SOSP'07, PDF
The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP October 2003. PDF
SystemML: Declarative machine learning on MapReduce, A. Ghoting, R. Krishnamurthy, E.  Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian and S. Vaithyanathan, ICDE 2011. PDF
Addressing the Concerns of the Lacks Family: Quantification of Kin Genomic Privacy, M. Humbert, E. Ayday, JP Hubaux, A. Telenti, CCS 2013. PDF
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker, I. Stoica, NSDI 2011. PDF
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly, EuroSys'07, PDF
GraphChi: Large-Scale Graph Computation on Just a PC, A. Kyrola, G. Blelloch, C. Guestrin, OSDI 2012. PDF
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations, U Kang, C. E. Tsourakakis and C. Faloutsos, ICDM 2009. PDF
Pairwise Element Computation with MapReduce, Kiefer, Volk, and Lehner, HPDC 2010. PDF
Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer. PDF
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin and J. M. Hellerstein, PVLDB 2012. PDF
Pregel: A System for Large-Scale Graph Processing, G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, SIGMOD 2010. PDF
Rapid Parallel Genome Indexing with MapReduce, R. K. Menon, G. P. Bhat, M. C. Schatz, MapReduce 2011. PDF
GFS: Evolution on Fast-forward, Marshall Kirk McKusick, Sean Quinlan, ACM Queue Aug, 2009. HTML
Pregel: a system for large-scale graph processing, G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, SIGMOD 2010. PDF
PigLatin: a not-so-foreign language for data processing, C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, SIGMOD 2008. PDF
"Data Intensive Computing for Bioinformatics," Qiu et al, Chapter 16 in Bioinformatics: Concepts, Methodologies, Tools, and Applications, March 2013. PDF
CloudBurst: highly sensitive read mapping with MapReduce, M. Schatz, Bioinformatics Vol. 25, No. 11, Pgs. 1363-1369, 2009. PDF
MapReduce and Parallel DBMSs: Friends or Foes?, Stonebraker, Abadi, Dewitt, Madden, Paulson, Pavlo, Rasin. CACM 2010. PDF
Sharing Graphs using Differentially Private Graph Models, A. Sala, X. Zhao, C. Wilson, H. Zheng, B. Y. Zhao, IMC 2011. PDF
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, R. Taylor, BMC Informatics 2010. PDF
Hive - a petabyte scale data warehouse using Hadoop, A. Thusoo, J. S. Sarma, N. Jain, S.  Zheng, P. Chakka, N. Zhang, S. Antony, H. Liu and R. Murthy, ICDE 2010. PDF
Designing good mapreduce algorithms, Jeffrey D. Ullman, XRDS Vol 19, No 1, 2012. PDF
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey, OSDI'08, PDF
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, I. Stoica, NSDI 2012. PDF
Keyboard Acoustic Emanations Revisited, L. Zhuang, F. Zhou, D. Tygar, CCS 2005. PDF