MS Project Defense: Keivn Malta, "Efficient Job Completion Time Prediction for Big Data Workloads"

Thursday, June 8, 2017 - 11:00am
1132 Harold Frank Hall
Efficient Job Completion Time Prediction for Big Data Workloads
Kevin Malta
Chandra Krintz (Co-Chair), and Rich Wolski (Co-Chair)


With the increase in size of datasets needing to be processed, companies and researchers have turned to cloud computing infrastructure to host their computational workloads.  Given a workload, in order to choose an optimal cloud resource configuration, one needs to accurately measure and/or predict job completion time for various resource configurations in consideration.  However, these measurements can be expensive both in time and cost.  

In this work, we train a logistic regression classifier implemented through Spark’s MLlib in order to model job completion time across various datasets.  With our methodology, we’ve built a deliverable for users to profile the number of epochs a particular cluster configuration can achieve on their dataset in an hour.  The platform is built using modern web development technologies, and purchases machines as cheaply as possible through use of the AWS spot market. 

Everyone welcome!