Getting Started with Spark
  • Introduction
  • How to Use This Course
  • Udemy 101: Getting the Most From This Course
  • [Activity]Getting Set Up: Installing Python, a JDK, Spark, and its Dependencies.
  • Alternate MovieLens download location
  • [Activity] Installing the MovieLens Movie Rating Dataset
  • [Activity] Run your first Spark program! Ratings histogram example.
Spark Basics and the RDD Interface
  • What's new in Spark 3?
  • Introduction to Spark
  • The Resilient Distributed Dataset (RDD)
  • Ratings Histogram Walkthrough
  • Key/Value RDD's, and the Average Friends by Age Example
  • [Activity] Running the Average Friends by Age Example
  • Filtering RDD's, and the Minimum Temperature by Location Example
  • [Activity]Running the Minimum Temperature Example, and Modifying it for Maximums
  • [Activity] Running the Maximum Temperature by Location Example
  • [Activity] Counting Word Occurrences using flatmap()
  • [Activity] Improving the Word Count Script with Regular Expressions
  • [Activity] Sorting the Word Count Results
  • [Exercise] Find the Total Amount Spent by Customer
  • [Excercise] Check your Results, and Now Sort them by Total Amount Spent.
  • Check Your Sorted Implementation and Results Against Mine.
SparkSQL, DataFrames, and DataSets
  • Introducing SparkSQL
  • [Activity] Executing SQL commands and SQL-style functions on a DataFrame
  • Using DataFrames instead of RDD's
  • [Exercise] Friends by Age, with DataFrames
  • Exercise Solution: Friends by Age, with DataFrames
  • [Activity] Word Count, with DataFrames
  • [Activity] Minimum Temperature, with DataFrames (using a custom schema)
  • [Exercise] Implement Total Spent by Customer with DataFrames
  • Exercise Solution: Total Spent by Customer, with DataFrames
Advanced Examples of Spark Programs
  • [Activity] Find the Most Popular Movie
  • [Activity] Use Broadcast Variables to Display Movie Names Instead of ID Numbers
  • Find the Most Popular Superhero in a Social Graph
  • [Activity] Run the Script - Discover Who the Most Popular Superhero is!
  • [Exercise] Find the Most Obscure Superheroes
  • Exercise Solution: Most Obscure Superheroes
  • Superhero Degrees of Separation: Introducing Breadth-First Search
  • Superhero Degrees of Separation: Accumulators, and Implementing BFS in Spark
  • [Activity] Superhero Degrees of Separation: Review the Code and Run it
  • Item-Based Collaborative Filtering in Spark, cache(), and persist()
  • [Activity] Running the Similar Movies Script using Spark's Cluster Manager
  • [Exercise] Improve the Quality of Similar Movies
Running Spark on a Cluster
  • Introducing Elastic MapReduce
  • [Activity] Setting up your AWS / Elastic MapReduce Account and Setting Up PuTTY
  • Partitioning
  • Create Similar Movies from One Million Ratings - Part 1
  • [Activity] Create Similar Movies from One Million Ratings - Part 2
  • Create Similar Movies from One Million Ratings - Part 3
  • Troubleshooting Spark on a Cluster
  • More Troubleshooting, and Managing Dependencies
Machine Learning with Spark ML
  • Introducing MLLib
  • [Activity] Using Spark ML to Produce Movie Recommendations
  • Analyzing the ALS Recommendations Results
  • [Activity] Linear Regression with Spark ML
  • [Exercise] Using Decision Trees in Spark ML to Predict Real Estate Prices
  • Exercise Solution: Decision Trees with Spark
Spark Streaming, Structured Streaming, and GraphX
  • Spark Streaming
  • [Activity] Structured Streaming in Python
  • [Exercise] Use Windows with Structured Streaming to Track Most-Viewed URL's
  • Exercise Solution: Using Structured Streaming with Windows
  • GraphX
You Made It! Where to Go from Here.
  • Learning More about Spark and Data Science
  • Bonus Lecture: More courses to explore!