MS Project Defense - Jiyu Chen

Date: 
Friday, February 24, 2017 - 4:00pm
Location: 
1152 Harold Frank Hall
Title: 
Software Support for Versioned Data Search
Speaker: 
Jiyu Chen
Committee: 
Tao Yang (Chair), Yuan-Fang Wang

Abstract

Organizations and companies archive many versions of their digital documents and multimedia data for preservation, electronic discovery, and regulatory compliance. There are research challenges and opportunities for developing integrated archival and search support needed. Since versioned datasets contain highly repetitive content, deduplication can reduce the storage demand by an order of magnitude or more; however such an optimization is resource-intensive. The two-phase method seeks a cost tradeoff by searching representatives at Phase 1 to quickly narrow the search scope using clustering.  Phase 2 of this method re-ranks top document versions with fragment-based index for each cluster. This project will study a low-cost method for deduplication and indexing, and finally deliver a ready-to-use software package for versioned data search.

Everyone welcome!