Get Started with Apache Spark
  • Course Overview
  • How to Take this Course and How to Get Support
  • Text Lecture: How to Take this Course and How to Get Support
  • Introduction to Spark
  • Sides
  • Java 9 Warning
  • Install Java and Git
  • Source Code
  • Set up Spark project with IntelliJ IDEA
  • Set up Spark project with Eclipse
  • Text lecture: Set up Spark project with Eclipse
  • Run our first Spark job
  • Trouble shooting: running Hadoop on Windows
RDD
  • RDD Basics
  • Create RDDs
  • Text Lecture: Create RDDs
  • Map and Filter Transformation
  • Solution to Airports by Latitude Problem
  • FlatMap Transformation
  • Text Lectures: flatMap Transformation
  • Set Operation
  • Sampling With Replacement and Sampling Without Replacement
  • Solution for the Same Hosts Problem
  • Actions
  • Solution to Sum of Numbers Problem
  • Important Aspects about RDD
  • Summary of RDD Operations
  • Caching and Persistence
Spark Architecture and Components
  • Spark Architecture
  • Spark Components
Pair RDD
  • Introduction to Pair RDD
  • Create Pair RDDs
  • Filter and MapValue Transformations on Pair RDD
  • Reduce By Key Aggregation
  • Sample solution for the Average House problem
  • Group By Key Transformation
  • Sort By Key Transformation
  • Sample Solution for the Sorted Word Count Problem
  • Data Partitioning
  • Join Operations
  • Extra Learning Material: How are Big Companies using Apache Spark
Advanced Spark Topic
  • Accumulators
  • Text Lecture: Accumulators
  • Solution to StackOverflow Survey Follow-up Problem
  • Broadcast Variables
Spark SQL
  • Introduction to Spark SQL
  • Spark SQL in Action
  • Spark SQL practice: House Price Problem
  • Spark SQL Joins
  • Strongly Typed Dataset
  • Use Dataset or RDD
  • Dataset and RDD Conversion
  • Performance Tuning of Spark SQL
  • Extra Learning Material: Avoid These Mistakes While Writing Apache Spark Program
Running Spark in a Cluster
  • Introduction to Running Spark in a Cluster
  • Package Spark Application and Use spark-submit
  • Run Spark Application on Amazon EMR (Elastic MapReduce) cluster
Additional Learning Materials
  • Future Learning
  • Text Lecture: Future Learning
  • Coupons to Our Other Courses