IJRCS – Volume 4 Issue 2 Paper 5


Author’s Name : S Tamil Selvan | J Kokilavani

Volume 04 Issue 02  Year 2017  ISSN No:  2349-3828  Page no: 16-18



The big-data refers to the large-scale distributed data processing applications. Google’s MapReduce and Apache’s Hadoop, is an open-source framework that operates extraordinarily on large amounts of data. MapReduce framework is the framework that generates a large amount of intermediate data. Such abundant information is thrown away after the tasks finish, because MapReduce is unable to utilize them. In order to enhance efficiency of MapReduce functionality, we propose a data-aware prefetcher framework for big-data applications. In this framework tasks submit their intermediate results to the prefetcher. A task queries the prefetcher before executing the actual computing work. A novel prefetch description scheme and a prefetch request and reply protocol are designed. Experimental results show that Prefetcher significantly improves the completion time of Hadoop MapReduce job.


Big-data, Map Reduce,Hadoop, Prefetcher,Intermediate results


  1. J. Dean and S. Ghemawat, Mapreduce: Simplified data processing on large clusters, Commun. of ACM, vol. 51, no. 1, pp. 107-113, 2008.
  2. D. Peng and f. Dabek,”Large Scale incremental Processing using distributed Transaction and notification”, in Proc. of OSDI’2010, Berkeley, CA, USA, 2010
  3. Jiong Xie, Shu Yin, Xiaojun Ruan, Zhiyang Ding, “Improving MapReduce Performance through Data Placement in Heterogeneous Hadoop Clusters”, Department of Computer Science and Software Engineering Auburn University, Auburn, AL 36849-5347
  4. Zhenhua Guo, Geoffrey Fox “Improving MapReduce Performance in Heterogeneous Network Environments and Resource Utilization” School of Informatics and Computing Indiana University Bloomington Bloomington, IN USA
  5. Weikuan Yu, Member, IEEE, Yandong Wang, and Xinyu Que, “Design and Evaluation of Network-Levitated Merge for Hadoop Acceleration”, IEEE Transactions on Parallel and Distributed Systems
  6. Y. Zhang, S. Chen, Q. Wang, and G. Yu, “i2mapreduce: Incremental mapreduce for mining evolving big data,” CoRR, vol. abs/ 1501.04854, 2015.
  7. Y. Bu, B. Howe, M. Balazinska and M.D Ernst,”Hadoop: Efficient iterative data processing on large clusters,” in proc, VLDB Endowment, 2010, vol. 3,no.1-2, pp.285-296.
  8. C. Olston, G. Chiou, L. Chitnis, F. Liu, Y. Han, M. Larsson, A. Neumann, V. B. N. Rao, V. Sankarasubramanian, S. Seth, C. Tian, T. ZiCornell and X. Wang. 2011. Nova: Continuous pig/Hadoop workflows, in Proc. of SIGMOD’2011, New York, NY, USA.