Course Outline
Upon completing this course, the learner will be able to meet these overall objectives:

  • An overview of approaches facilitating data analytics on huge datasets. Different strategies are presented including sampling to make classical analytics tools amenable for big datasets, analytics tools that can be applied in the batch or the speed layer of a lambda architecture, stream analytics, and commercial attempts to make big data manageable in massively distributed or in-memory databases. Learners will be able to realistically assess the application of big data analytics technologies for different usage scenarios and start with their own experiments.
  • Introduction of HDFS
  • HDFS role in Hadoop
  • Daemons of Hadoop and its functionality
  • Anatomy of File Read & Write
  • Parallel Copying using DistCp
  • Data Organization
  • Store the Data into HDFS
  • Read the Data from HDFS
  • Rack Awareness
  • Blocks and Replication