Course Outline
Upon completing this course, the learner will be able to meet these overall objectives:

  • An overview of approaches facilitating data analytics on huge datasets. Different strategies are presented including sampling to make classical analytics tools amenable for big datasets, analytics tools that can be applied in the batch or the speed layer of a lambda architecture, stream analytics, and commercial attempts to make big data manageable in massively distributed or in-memory databases. Learners will be able to realistically assess the application of big data analytics technologies for different usage scenarios and start with their own experiments.
  • Introduction to HIVE
  • HIVE Meta Store
  • HIVE Architecture
  • Tables in HIVE
  • Hive Data Types
  • Joining Datasets in Apache Hive
  • Understanding Partition and Skew
  • Analyzing Big Data with Apache Hive
  • Computing NGrams
  • HIVE UDF’s and UADF’s with Programs