Learn all the buzzwords! And install the Hortonworks Data Platform Sandbox.
  • Udemy 101: Getting the Most From This Course
  • Tips for Using This Course
  • If you have trouble downloading Hortonworks Data Platform...
  • Warning for Apple M1 users
  • Installing Hadoop [Step by Step]
  • The Hortonworks and Cloudera Merger, and how it affects this course.
  • Hadoop Overview and History
  • Overview of the Hadoop Ecosystem
Using Hadoop's Core: HDFS and MapReduce
  • HDFS: What it is, and how it works
  • Alternate MovieLens download location
  • Installing the MovieLens Dataset
  • [Activity] Install the MovieLens dataset into HDFS using the command line
  • MapReduce: What it is, and how it works
  • How MapReduce distributes processing
  • MapReduce example: Break down movie ratings by rating score
  • Troubleshooting tips: installing pip and mrjob
  • [Activity] Installing Python, MRJob, and nano
  • [Activity] Code up the ratings histogram MapReduce job and run it
  • [Exercise] Rank movies by their popularity
  • [Activity] Check your results against mine!
Programming Hadoop with Pig
  • Introducing Ambari
  • Introducing Pig
  • Example: Find the oldest movie with a 5-star rating using Pig
  • [Activity] Find old 5-star movies with Pig
  • More Pig Latin
  • [Exercise] Find the most-rated one-star movie
  • Pig Challenge: Compare Your Results to Mine!
Programming Hadoop with Spark
  • Why Spark?
  • The Resilient Distributed Dataset (RDD)
  • [Activity] Find the movie with the lowest average rating - with RDD's
  • Datasets and Spark 2.0
  • [Activity] Find the movie with the lowest average rating - with DataFrames
  • [Activity] Movie recommendations with MLLib
  • [Exercise] Filter the lowest-rated movies by number of ratings
  • [Activity] Check your results against mine!
Using relational data stores with Hadoop
  • What is Hive?
  • [Activity] Use Hive to find the most popular movie
  • How Hive works
  • [Exercise] Use Hive to find the movie with the highest average rating
  • Compare your solution to mine.
  • Integrating MySQL with Hadoop
  • [Activity] Install MySQL and import our movie data
  • [Activity] Use Sqoop to import data from MySQL to HFDS/Hive
  • [Activity] Use Sqoop to export data from Hadoop to MySQL
Using non-relational data stores with Hadoop
  • Why NoSQL?
  • What is HBase
  • [Activity] Import movie ratings into HBase
  • [Activity] Use HBase with Pig to import data at scale.
  • Cassandra overview
  • If you have trouble installing Cassandra...
  • [Activity] Installing Cassandra
  • [Activity] Write Spark output into Cassandra
  • MongoDB overview
  • [Activity] Install MongoDB, and integrate Spark with MongoDB
  • [Activity] Using the MongoDB shell
  • Choosing a database technology
  • [Exercise] Choose a database for a given problem
Querying your Data Interactively
  • Overview of Drill
  • [Activity] Setting up Drill
  • [Activity] Querying across multiple databases with Drill
  • Overview of Phoenix
  • [Activity] Install Phoenix and query HBase with it
  • [Activity] Integrate Phoenix with Pig
  • Overview of Presto
  • [Activity] Install Presto, and query Hive with it.
  • [Activity] Query both Cassandra and Hive using Presto.
Managing your Cluster
  • YARN explained
  • Tez explained
  • [Activity] Use Hive on Tez and measure the performance benefit
  • Mesos explained
  • ZooKeeper explained
  • [Activity] Simulating a failing master with ZooKeeper
  • Oozie explained
  • [Activity] Set up a simple Oozie workflow
  • Zeppelin overview
  • [Activity] Use Zeppelin to analyze movie ratings, part 1
  • [Activity] Use Zeppelin to analyze movie ratings, part 2
  • Hue overview
  • Other technologies worth mentioning
Feeding Data to your Cluster
  • Kafka explained
  • [Activity] Setting up Kafka, and publishing some data.
  • [Activity] Publishing web logs with Kafka
  • Flume explained
  • [Activity] Set up Flume and publish logs with it.
  • [Activity] Set up Flume to monitor a directory and store its data in HDFS
Analyzing Streams of Data
  • Spark Streaming: Introduction
  • [Activity] Analyze web logs published with Flume using Spark Streaming
  • [Exercise] Monitor Flume-published logs for errors in real time
  • Exercise solution: Aggregating HTTP access codes with Spark Streaming
  • Apache Storm: Introduction