IJRCS – Volume 5 Issue 3 Paper 1


Author’s Name : S Tamil Selvan | Dr P Balamurugan

Volume 05 Issue 02  Year 2018  ISSN No:  2349-3828  Page no: 1- 6



Big Data is a term which describes techniques and technologies to capture, store, distribute, manage and analyze larger-sized datasets with high-velocity and different structures. Big data can be of many forms like structured, unstructured or semi-structured, resulting in incapability of conventional data management methods. Data is generated from various different sources and can arrive in the system at various rates. Parallelism is used to process these large amounts of data in an inexpensive and efficient way. Big Data is a data whose scale, diversity, and complexity require new architecture, techniques, algorithms, and analytics to manage it and extract value and hidden knowledge from it. Hadoop is an open source software project that enables the distributed processing of large data sets across clusters of commodity servers. Hadoop is the core platform for structuring Big Data, and solves the problem of making it useful for analytics purposes. It is designed to scale up from a single server to thousands of machines, with a very high degree of fault tolerance.


Big Data, Hadoop, Map Reduce, HDFS, Hadoop Components


  1. S.Vikram Phaneendra & E.Madhusudhan Reddy “Big Data- solutions for RDBMS problems- A survey” In 12th IEEE/IFIP Network Operations & Management Symposium (NOMS 2010) (Osaka, Japan, Apr 19{23 2013).
  2. Kumara Reddi & Dnvsl Indira “Different Techniques to Transfer Big Data : Survey” IEEE Transactions on 52(8) Aug 2013) 2348 {2355}
  3. Jimmy Lin “MapReduce Is Good Enough?” The control project. IEEE Computer 32 (2013).
  4. Umasri.M.L, Shyamalagowri.D ,Suresh Kumar.S “Mining Big Data:- Current status and forecast to the future” Volume 4, Issue 1, January 2014 ISSN: 2277 128X
  5. Albert Bifet “Mining Big Data In Real Time” Informatica 37 (2013) 15–20 DEC 2012
  6. Bernice Purcell “The emergence of “big data” technology and analytics” Journal of Technology Research 2013.
  7. Sameer Agarwal†, Barzan MozafariX, Aurojit Panda†, Henry Milner†, Samuel MaddenX, Ion Stoica “BlinkDB: Queries with Bounded Errors and Bounded Response Times on Very Large Data” Copyright © 2013ì ACM 978-1-4503-1994 2/13/04
  8. Yingyi Bu _ Bill Howe _ Magdalena Balazinska _ Michael D. Ernst “The HaLoop Approach to Large- Scale Iterative Data Analysis” VLDB 2010 paper “HaLoop: Efficient Iterative Data Processing on Large Clusters.
  9. Kenn Slagter · Ching-Hsien Hsu “An improved partitioning mechanism for optimizing massive data analysis using MapReduce” Published online: 11 April 2013 Ahmed Eldawy, Mohamed F. Mokbel “A
  10. Demonstration of Spatial Hadoop:An Efficient MapReduce Framework for Spatial Data” Proceedings of the VLDB Endowment, Vol. 6, No. 12 Copyright 2013 VLDB Endowment 21508097/13/10.
  11. Jeffrey Dean and Sanjay Ghemawat “MapReduce: Simplified Data Processing on Large Clusters” OSDI 2010
  12. Niketan Pansare1, Vinayak Borkar2, Chris Jermaine1, Tyson Condie “Online Aggregation for Large MapReduce Jobs” August 29September 3, 2011, Seattle, WA Copyright 2011VLDB Endowment, ACM
  13. Tyson Condie, Neil Conway, Peter Alvaro, Joseph M. Hellerstein “Online Aggregation and Continuous Query support in MapReduce” SIGMOD’10, June 6–11, 2010, Indianapolis, Indiana, USA. Copyright 2010 ACM 978-1-4503-0032-2/10/06.
  14. Jonathan Paul Olmsted “Scaling at Scale: Ideal Point Estimation with ‘Big-Data” Princeton Institute for Computational Science and Engineering 2014.
  15. Jonathan Stuart Ward and Adam Barker “Undefined By Data: A Survey of Big Data Definitions” Stamford, CT: Gartner, 2012.
  16. Balaji Palanisamy, Member, IEEE, Aameek Singh, Member, IEEE Ling Liu, Senior Member, IEEE” Cost-effective Resource Provisioning for MapReduce in a Cloud” Gartner report 2010, 25
  17. Mrigank Mridul, Akashdeep Khajuria, Snehasish Dutta, Kumar N “ Analysis of Bidgata using Apache Hadoop and Map Reduce” Volume 4, Issue 5, May 2014” 27
  18. Kyong-Ha Lee Hyunsik Choi “Parallel Data Processing with MapReduce: A Survey” SIGMOD Record, December 2011 (Vol. 40, No. 4)
  19. Chen He Ying Lu David Swanson “Matchmaking: A New MapReduce Scheduling” in 10th IEEE International Conference on Computer and Information Technology (CIT’10), pp. 2736–2743, 2010