You, This Course and Us
  • You, This Course and Us
  • Course Materials
  • Installing Scala and Hello World
Introduction to Spark
  • What does Donald Rumsfeld have to do with data analysis?
  • Why is Spark so cool?
  • An introduction to RDDs - Resilient Distributed Datasets
  • Built-in libraries for Spark
  • Installing Spark
  • The Spark Shell
  • See it in Action : Munging Airlines Data with Spark
  • Transformations and Actions
Resilient Distributed Datasets
  • RDD Characteristics: Partitions and Immutability
  • RDD Characteristics: Lineage, RDDs know where they came from
  • What can you do with RDDs?
  • Create your first RDD from a file
  • Average distance travelled by a flight using map() and reduce() operations
  • Get delayed flights using filter(), cache data using persist()
  • Average flight delay in one-step using aggregate()
  • Frequency histogram of delays using countByValue()
Advanced RDDs: Pair Resilient Distributed Datasets
  • Special Transformations and Actions
  • Average delay per airport, use reduceByKey(), mapValues() and join()
  • Average delay per airport in one step using combineByKey()
  • Get the top airports by delay using sortBy()
  • Lookup airport descriptions using lookup(), collectAsMap(), broadcast()
Advanced Spark: Accumulators, Spark Submit, MapReduce , Behind The Scenes
  • Get information from individual processing nodes using accumulators
  • Long running programs using spark-submit
  • Spark-Submit with Scala - A demo
  • Behind the scenes: What happens when a Spark script runs?
  • Running MapReduce operations
PageRank: Ranking Search Results
  • What is PageRank?
  • The PageRank algorithm
  • Implement PageRank in Spark
  • Join optimization in PageRank using Custom Partitioning
Spark SQL
  • Dataframes: RDDs + Tables
MLlib in Spark: Build a recommendations engine
  • Collaborative filtering algorithms
  • Latent Factor Analysis with the Alternating Least Squares method
  • Music recommendations using the Audioscrobbler dataset
  • Implement code in Spark using MLlib
Spark Streaming
  • Introduction to streaming
  • Implement stream processing in Spark using Dstreams
  • Stateful transformations using sliding windows
Graph Libraries
  • The Marvel social network using Graphs
Scala Language Primer
  • Scala - A "better Java"?
  • How do Classes work in Scala?
  • Classes in Scala - continued
  • Functions are different from Methods
  • Collections in Scala
  • Map, Flatmap - The Functional way of looping
  • First Class Functions revisited
  • Partially Applied Functions
  • Closures
  • Currying
Supplementary Installs
  • Installing Intellij
  • Installing Anaconda
  • [For Linux/Mac OS Shell Newbies] Path and other Environment Variables