EFFICIENT CLOUD BACKUP USING CHUNKING OF DATA
Author’s Name : Amal Thankachan | R Sujitha
Volume 01 Issue 04 Year 2014 ISSN No: 2349-3828 Page no: 10-14
Deduplication has become a widely deployed technology in cloud data centers to improve IT resources efficiency. However, traditional techniques face a great challenge in big data deduplication to strike a sensible tradeoff between the conflicting goals of scalable deduplication throughput and high duplicate elimination ratio. We propose AppDedupe, an application-aware scalable inline distributed deduplication framework in cloud environment, to meet this challenge by exploiting application awareness, data similarity and locality to optimize distributed deduplication with inter-node two-tiered data routing and intra-node application-aware deduplication. It first dispenses application data at file level with an application-aware routing to keep application locality, then assigns similar application data to the same storage node at the super-chunk granularity using a hand printing based stateful data routing scheme to maintain high global deduplication efficiency, meanwhile balances the workload across nodes. AppDedupe builds application-aware similarity indices with super-chunk handprints to speedup the intra-node deduplication process with high efficiency. Our experimental evaluation of AppDedupe against state-of-the-art, driven by real-world datasets, demonstrates that AppDedupe achieves the highest global deduplication efficiency with a higher global deduplication effectiveness than the high-overhead and poorly scalable traditional scheme, but at an overhead only slightly higher than that of the scalable but low duplicate-elimination-ratio approaches
Big Data Deduplication, Application Awareness, Data Routing, Handprinting, Similarity Index
- Gantz, D. Reinsel, “The Digital Universe Decade-Are You Ready?” White Paper, IDC, May 2010.
- Biggar, “Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements,” White Paper, the Enterprise Strategy Group, Feb. 2007.
- R. Jayaram, C. Peng, Z. Zhang, M. Kim, H. Chen, H. Lei. “An Empirical Analysis of Similarity in Virtual Machine Images,” Proc. Of the ACM/IFIP/USENIX Middleware Industry Track Workshop (Middleware’11), Dec. 2011.
- Srinivasan, T. Bisson, G. Goodson, and K. Voruganti. “iDedup: Latency-aware, inline data deduplication for primary storage,” Proc. of the 10th USENIX Conference on File and Storage Technologies (FAST’12). Feb. 2012.
- Shilane, M. Huang, G. Wallace, and W. Hsu. “WAN opti- mized replication of backup datasets using stream-informed delta compression,” ACM Transactions on Storage (TOS), 8(4): 915-921, Nov. 2012.