Title: Source-Side Collaborative Deduplication for Cluster-based Virtual Machine Backup in the Cloud
Periodic backup of virtual machine snapshots is important for data retention and fault recovery, but it demands high storage cost and network traffic. Source-side deduplication can significantly reduce the amount of data transmitted over the network, but the resource requirements can be intensive, affecting other collocated services. In this project, we develop and implement an asynchronous low-profile backup scheme with distributed fingerprint comparison and detection in a cloud cluster. The key idea is to conduct duplicate detection collaboratively among machines before actual backup, conduct fingerprint comparison partition by partition independently, and exchange information among machines asynchronously. The key advantage of this approach is that it can deal with skewed workloads and tolerate failure or slowness of some machines while having a low resource usage. This talk presents the design and implementation details, and our evaluation to validate the proposed approach.