Lecture and Paper Schedule

(Note: Lecture notes and slides will be posted on the Piazza group).

Please use the Piazza group to claim papers.

Date
Topic
Reading
Presenter(s)
 Tu: 1/7
 Introduction and Projects

 
 Th: 1/9
 MapReduce, Google File System
 DG04, GGL03
 
 Tu: 1/14
 Algorithms on MapReduce
KVL10
  Di Ma
 Th: 1/16
 Graphs on MapReduce
C09, KTF09
  Arvind, Lin Zhou
 Tu: 1/21
 Alternatives to MR on Graphs
MAB+10
  Qingyun Liu
 Th: 1/23
 GraphLab
LGK+12, KBG12
  Daniel Kudrow, Mai ElSherief
 Tu: 1/28
 Dryad IBY07, YIF+08   Dibyendu Nath, Victor Zakhary
 Th: 1/30
 MR++: FlumeJava and MapReduce Online
CRP+10, CCA+10   Bolun Wang, Michael Nekrasov
 Tu: 2/4
 MR++: Mesos, Dolly
HKZ+11, AGS+13
  Erdinc Korpeoglu, Yibo Zhu
 Th: 2/6
 Higher Level Languages, Pig/Hive 
ORS+08, TSJ+10
  Agnethe Soeraa, Michael Agun
 Tu: 2/11
 Mid-quarter Project Status presentations  
 
 Th: 2/13
 NoSQL Systems, Part I CDG+06, DHJ+07
  Tianyi Wang, Ana Nika
 Tu: 2/18
 NoSQL Systems, Part II,  NoSQL vs DBs D08, DG10, SAD+10   Maxwell Hinson, Yanglei Li, Asad Ismail
 Th: 2/20
 Data Privacy and Anonymity BDK07, SZW+11   Divya Sambasivan, Ruoyu Wang
 Tu: 2/25
 Data Mining and Security Threats ZZT05, HAH+13
  Xiaohan Zhao, Gang Wang
 Th: 2/27
 Machine Learning I
GKP+11, CKL+06
  Sindre Haneset Nygaard, Karthik Puthraya
 Tu: 3/4
 Machine Learning II
ZCD+12, MQ09
  Nevena Golubovic, Nitharshaan Thevarajah
 Th: 3/6
 Computational Biology  T10, S09
  Morgan Virgil, Zihao Song
 Tu: 3/11
 Computational Biology II (Optional: QEG+13) MBS11
  Rolf Erik Heggem Lekang
 Th: 3/13
 Random Topics/All Questions Answered    None
 
 Wed: 3/19
 Final Project Presentations, 12-3pm



Paper List (Still under construction...)

AGS+13
Effective Straggler Mitigation: Attack of the Clones, G. Ananthanarayanan, A. Ghodsi, S. Shenker, I. Stoica, NSDI 2013. PDF
BDK07
Wherefore Art Thou R3579X? Anonymized Social Networks, Hidden Patterns, and Structural Steganography, Lars Backstrom, Cynthia Dwork, Jon Kleinberg, WWW'07, PDF
C09
Graph Twiddling in a MapReduce World, J. Cohen, Computing in Science & engineering, 2009. PDF
CCA+10
MapReduce Online, T. Condie, N. Conway, P. Alvaro, J. M. Hellerstein, K. Elmeleegy, R. Sears, NSDI 2010, PDF
CDG+06
Bigtable: A Distributed Storage System for Structured Data, F. Chang et al, OSDI 2006. PDF
CKL+06
Map-Reduce for Machine Learning on Multicore, C. Chu, S. K. Kim, Y. A. Lin, Y. Yu, G. R. Bradski, A. Y. Ng, and K. Olukotun, NIPS 2006. PDF
CRP+10
FlumeJava: easy, efficient data-parallel pipelines, C. Chambers, A. Raniwala, F. Perry, S. Adams, R. R. Henry, R. Bradshaw, N. Weizenbaum. PLDI 2010. PDF
D08
MapReduce: A major step backwards, David DeWitt and Michael Stonebraker, Vertica Blog entry, January 2008. HTML
DG04
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Sanjay Ghemawat, OSDI'04, PDF
DG10
MapReduce: A flexible Data Processing Tool, J. Dean, S. Ghemawat, CACM, 2010. PDF
DHJ+07
Dynamo: Amazon's Highly Available Key-value Store, G. DeCandia, D. Hastorun, M. Jampani, G. Kakulapati, A. Lakshman, A. Pilchin, S. Sivasubramanian, P. Vosshall and W. Vogels, SOSP'07, PDF
GGL03
The Google File System, Sanjay Ghemawat, Howard Gobioff, Shun-Tak Leung, SOSP October 2003. PDF
GKP+11
SystemML: Declarative machine learning on MapReduce, A. Ghoting, R. Krishnamurthy, E.  Pednault, B. Reinwald, V. Sindhwani, S. Tatikonda, Y. Tian and S. Vaithyanathan, ICDE 2011. PDF
HAH+13
Addressing the Concerns of the Lacks Family: Quantification of Kin Genomic Privacy, M. Humbert, E. Ayday, JP Hubaux, A. Telenti, CCS 2013. PDF
HKZ+11
Mesos: A Platform for Fine-Grained Resource Sharing in the Data Center, B. Hindman, A. Konwinski, M. Zaharia, A. Ghodsi, A.D. Joseph, R. Katz, S. Shenker, I. Stoica, NSDI 2011. PDF
IBY+07
Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks, Michael Isard, Mihai Budiu, Yuan Yu, Andrew Birrell, and Dennis Fetterly, EuroSys'07, PDF
KBG12
GraphChi: Large-Scale Graph Computation on Just a PC, A. Kyrola, G. Blelloch, C. Guestrin, OSDI 2012. PDF
KTF09
PEGASUS: A Peta-Scale Graph Mining System - Implementation and Observations, U Kang, C. E. Tsourakakis and C. Faloutsos, ICDM 2009. PDF
KVL10
Pairwise Element Computation with MapReduce, Kiefer, Volk, and Lehner, HPDC 2010. PDF
LD13
Data-Intensive Text Processing with MapReduce, Jimmy Lin and Chris Dyer. PDF
LGK+12
Distributed GraphLab: A Framework for Machine Learning and Data Mining in the Cloud, Y. Low, J. Gonzalez, A. Kyrola, D. Bickson, C. Guestrin and J. M. Hellerstein, PVLDB 2012. PDF
MAB+10
Pregel: A System for Large-Scale Graph Processing, G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, G. Czajkowski, SIGMOD 2010. PDF
MBS11
Rapid Parallel Genome Indexing with MapReduce, R. K. Menon, G. P. Bhat, M. C. Schatz, MapReduce 2011. PDF
MQ09
GFS: Evolution on Fast-forward, Marshall Kirk McKusick, Sean Quinlan, ACM Queue Aug, 2009. HTML
MAB+10
Pregel: a system for large-scale graph processing, G. Malewicz, M. H. Austern, A. J. C. Bik, J. C. Dehnert, I. Horn, N. Leiser, and G. Czajkowski, SIGMOD 2010. PDF
ORS+08
PigLatin: a not-so-foreign language for data processing, C. Olston, B. Reed, U. Srivastava, R. Kumar, A. Tomkins, SIGMOD 2008. PDF
QEG+13
"Data Intensive Computing for Bioinformatics," Qiu et al, Chapter 16 in Bioinformatics: Concepts, Methodologies, Tools, and Applications, March 2013. PDF
S09
CloudBurst: highly sensitive read mapping with MapReduce, M. Schatz, Bioinformatics Vol. 25, No. 11, Pgs. 1363-1369, 2009. PDF
SAD+10
MapReduce and Parallel DBMSs: Friends or Foes?, Stonebraker, Abadi, Dewitt, Madden, Paulson, Pavlo, Rasin. CACM 2010. PDF
SZW+11
Sharing Graphs using Differentially Private Graph Models, A. Sala, X. Zhao, C. Wilson, H. Zheng, B. Y. Zhao, IMC 2011. PDF
T10
An overview of the Hadoop/MapReduce/HBase framework and its current applications in bioinformatics, R. Taylor, BMC Informatics 2010. PDF
TSJ+10
Hive - a petabyte scale data warehouse using Hadoop, A. Thusoo, J. S. Sarma, N. Jain, S.  Zheng, P. Chakka, N. Zhang, S. Antony, H. Liu and R. Murthy, ICDE 2010. PDF
U12
Designing good mapreduce algorithms, Jeffrey D. Ullman, XRDS Vol 19, No 1, 2012. PDF
YIF+08
DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language, Yuan Yu, Michael Isard, Dennis Fetterly, Mihai Budiu, Ulfar Erlingsson, Pradeep Kumar Gunda, and Jon Currey, OSDI'08, PDF
ZCD+12
Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing, M. Zaharia, M. Chowdhury, T. Das, A. Dave, J. Ma, M. McCauley, M. J. Franklin, S. Shenker, I. Stoica, NSDI 2012. PDF
ZZT05
Keyboard Acoustic Emanations Revisited, L. Zhuang, F. Zhou, D. Tygar, CCS 2005. PDF