Introduction
  • Why Spark
  • Spark High Level Components
  • Creating a Spark Maven Project
  • Dedicated TA Support
  • Import Source Code into Eclipse
  • First Spark Application
  • Spark Standalone Cluster Architecture
Spark Java Dataset API Basics
  • Ingesting CSV and JSON Files
  • How to reduce logging in the console
  • Real World Dataframes Example
  • Union Dataframes and Other Set Transformations
  • Converting Between Datasets and Dataframes
Diving Deeper with Datasets, Dataframes, Transformations and the DAG
  • Map and Reduce Transformation Functions
  • Using Datasets with User Defined POJOs
  • Using Datasets with Unstructured Textual Data
  • Joining Dataframes and Using Various Filter Transformations
  • Aggregation Transformations + Join Assignment
  • More on Transformations, Actions and the DAG
Running Spark Jobs on the Cloud
  • Using Spark to Analyze Reddit Comments
  • Running the Reddit Spark Application on an EMR Cluster
  • Instructions for Configuring a Spark Stand-alone Cluster
Spark Streaming Applications
  • Streaming Network Socket Example
  • Stock Market Files Streaming Example
  • Using Kafka with Spark Streaming
Machine Learning with Spark MLlib
  • Machine Learning Resources
  • Overview of Linear Regression
  • Spark Java Linear Regression Example
  • Overview of Logistic Regression
  • Spark Java Logistic Regression (Classification Algorithm)
  • Overview of K-Means Clustering
  • Spark Java K-Means Clustering Example
  • Get Access to All of my current and future courses!