Getting Started
  • Udemy 101: Getting the Most From This Course
  • Alternate download link for the ml-100k dataset
  • Introduction, and installing the course materials, IntelliJ, and Scala
  • Introduction to Apache Spark
  • Spark Basics
  • What's New in Spark 3?
Scala Crash Course [Optional]
  • [Activity] Scala Basics
  • [Exercise] Flow Control in Scala
  • [Exercise] Functions in Scala
  • [Exercise] Data Structures in Scala
Using Resilient Distributed Datasets (RDDs)
  • The Resilient Distributed Dataset
  • Ratings Histogram Example
  • Spark Internals
  • Key / Value RDD's, and the Average Friends by Age example
  • [Activity] Running the Average Friends by Age Example
  • Filtering RDD's, and the Minimum Temperature by Location Example
  • [Activity] Running the Minimum Temperature Example, and Modifying it for Maximum
  • [Activity] Counting Word Occurrences using Flatmap()
  • [Activity] Improving the Word Count Script with Regular Expressions
  • [Activity] Sorting the Word Count Results
  • [Exercise] Find the Total Amount Spent by Customer
  • [Exercise] Check your Results, and Sort Them by Total Amount Spent
  • Check Your Results and Implementation Against Mine
SparkSQL, DataFrames, and DataSets
  • Introduction to SparkSQL
  • [Activity] Using SparkSQL
  • [Activity] Using DataSets
  • [Exercise] Implement the "Friends by Age" example using DataSets
  • Exercise Solution: Friends by Age, with Datasets.
  • [Activity] Word Count example, using Datasets
  • [Activity] Revisiting the Minimum Temperature example, with Datasets
  • [Exercise] Implement the "Total Spent by Customer" problem with Datasets
  • Exercise Solution: Total Spent by Customer with Datasets
Advanced Examples of Spark Programs
  • [Activity] Find the Most Popular Movie
  • [Activity] Use Broadcast Variables to Display Movie Names
  • [Activity] Find the Most Popular Superhero in a Social Graph
  • [Exercise] Find the Most Obscure Superheroes
  • Exercise Solution: Find the Most Obscure Superheroes
  • Superhero Degrees of Separation: Introducing Breadth-First Search
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • [Activity] Superhero Degrees of Separation: Review the code, and run it!
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
  • [Activity] Running the Similar Movies Script using Spark's Cluster Manager
  • [Exercise] Improve the Quality of Similar Movies
Running Spark on a Cluster
  • [Activity] Using spark-submit to run Spark driver scripts
  • [Activity] Packaging driver scripts with SBT
  • [Exercise] Package a Script with SBT and Run it Locally with spark-submit
  • Exercise solution: Using SBT and spark-submit
  • Introducing Amazon Elastic MapReduce
  • Creating Similar Movies from One Million Ratings on EMR
  • Partitioning
  • Best Practices for Running on a Cluster
  • Troubleshooting, and Managing Dependencies
Machine Learning with Spark ML
  • Introducing MLLib
  • [Activity] Using MLLib to Produce Movie Recommendations
  • Linear Regression with MLLib
  • [Activity] Running a Linear Regression with Spark
  • [Exercise] Predict Real Estate Values with Decision Trees in Spark
  • Exercise Solution: Predicting Real Estate with Decision Trees in Spark
Intro to Spark Streaming
  • The DStream API for Spark Streaming
  • [Activity] Real-time Monitoring of the Most Popular Hashtags on Twitter
  • Structured Streaming
  • [Activity] Using Structured Streaming for real-time log analysis
  • [Exercise] Windowed Operations with Structured Streaming
  • Exercise Solution: Top URL's in a 30-second Window
Intro to GraphX
  • GraphX, Pregel, and Breadth-First-Search with Pregel.
  • Using the Pregel API with Spark GraphX
  • [Activity] Superhero Degrees of Separation using GraphX
You Made It! Where to Go from Here.
  • Learning More, and Career Tips
  • Bonus Lecture: More courses to explore!