Module-1 Introduction to Big data, Hadoop and Spark
  • 1.1 Overview of Big Data
  • 1.2 Introduction to Apache Hadoop
  • 1.3 Hadoop Distributed File System
  • 1.4 Hadoop Map Reduce
  • 1.5 Introduction to Apache Spark
  • 1.6 Characteristics of Apache Spark
  • 1.7 Users and Use Cases of Apache Spark
  • 1.8 Job Execution Flow and Spark Execution
  • 1.9 Spark Unified Stack
  • 1.10 Complete Picture of Apache Spark
  • 1.11 Why Spark with Scala
  • 1.12 Apache spark Architecture
Module 2: Introduction to Scala Programming Language
  • 2.1 Introduction to Scala
  • 2.2 Scala Basic Syntax
  • 2.3 Scala Class and Objects
  • 2.4 If else Statements in Scala
  • 2.5 Loops in Scala
Module 3: Advanced Scala Programming
  • 3.1 Functions and Procedures in Scala
  • 3.2 Access Modifiers
  • 3.3 Strings and Arrays
  • 3.4 Scala Collections
  • 3.5 Scala Traits
  • 3.6 Pattern Matching
  • 3.7 Scala Extractors
  • 3.8 Scala Exception Handling
  • 3.9 Scala Files IO
Module 4: Apache Spark RDDs
  • 4.1 Programming with RDDs
  • 4.2 Starting with Spark
  • 4.3 Creating RDDs
  • 4.4 RDD Operations
  • 4.5 Lifecycle of Spark
Module 5: Apache Spark RDDs II
  • 5.1 Spark Caching
  • 5.2 Common Transformations and Actions
  • 5.3 Spark Functions
  • 5.4 Some more Spark functions
Module 6: Working with Key-Value pairs
  • 6.1 Key Value Pairs
  • 6.2 Aggregate Functions
  • 6.3 Working with Aggregate Functions
  • 6.4 Joins in Spark
  • 6.5 Practical on Word count example
Module 7: Advanced Spark Programming
  • 7.1 Spark Shared Variables
  • 7.2 Spark and Fault Tolerance
  • 7.3 Broadcast variables
  • 7.4 Numeric RDD Operations
  • 7.5 Per-Partition Operations
Module 8: Running Spark jobs on Cluster
  • 8.1 Spark Runtime Architecture
  • 8.2 Spark Driver
  • 8.3 Executors
  • 8.4 Cluster Managers
  • 8.5 Cluster Managers II
Module 9: Spark SQL
  • 9.1 Introduction to Spark SQL
  • 9.2 Starting Point-SQL Context
  • 9.3 Hive with Spark SQL
  • 9.4 Spark SQL Caching
Module 10: Spark Streaming
  • People.json, Employee.json
Module 11: Machine Learning in Spark
  • 11.1 machine learning with mllib
  • 11.2 MLib Data Types
  • 11.3 labeled point data types
  • 11.4 Local Matrices in mllib
  • 11.5 MLib Algorithms
  • 11.6 Classification and Regression
  • 11.7 Clustering
Module 12: GraphX in Spark
  • 12.1 GraphX Introduction
  • 12.2 Creating Graphs
  • 12.3 Graph Operators
  • 12.4 Subgraph Transformation
  • 12.5 Computation with map reduce triplets