Introduction
  • Introduction
  • Using itversity platforms - Big Data Developer labs and forum
Scala Fundamentals
  • Setting up Scala
  • Basic Programming Constructs
  • Setup Scala on Windows
  • Functions
  • Object Oriented Concepts - Class
  • Object Oriented Concepts - Object
  • Object Oriented Concepts - Case Classes
  • Collections - Seq, Set and Map
  • Basic Map Reduce Operations
  • Setting up Data Sets for Basic I/O Operations
  • Basic I/O Operations and using Scala Collections APIs
  • Tuples
  • Development Cycle - Developing Source code
  • Development Cycle - Compile source code to jar using SBT
  • Development Cycle - Setup SBT on Windows
  • Development Cycle - Compile changes and run jar with arguments
  • Development Cycle - Setup IntelliJ with Scala
  • Development Cycle - Develop Scala application using SBT in IntelliJ
Spark Getting Started
  • Introduction
  • Setup Options
  • Setup using tar ball
  • Setup using Hortonworks Sandbox
  • Using labs.itversity.com
  • Using Windows - Putty and WinSCP
  • Using Windows - Cygwin
  • HDFS - Quick Preview
  • YARN - Quick Preview
  • Setup Data Sets
  • Curriculum
Core Spark - Transformations and Actions with advanced features
  • Introduction
  • Setup Spark on Windows
  • Problem Statement and Environment
  • Initialize the Job
  • Resilient Distributed Data Sets
  • Previewing the Data
  • Filtering the Data
  • Accumulators
  • Converting to Key Value Pairs - using map
  • Joining Data Sets
  • Get Daily Revenue by Product - reduceByKey
  • Get Daily Revenue and count by Product - aggregateByKey
  • Execution Life Cycle
  • Broadcast Variables
  • Sorting the Data - By Date in Ascending order and revenue in Descending Order
  • Saving data back to HDFS
  • Add spark dependencies to sbt
  • Develop as Scala based application
  • Run locally using spark-submit
  • Ship and run it on big data cluster
Spark SQL using Scala
  • Introduction to Spark SQL and Objectives
  • Different interfaces to run SQL - Hive, Spark SQL
  • Create database and tables of text file format - orders and order_items
  • Create database and tables of ORC file format - orders and order_items
  • Running queries using Scala - spark-shell
  • Functions - Getting Started
  • Functions - String Manipulation
  • Functions - Date Manipulation
  • Functions - Aggregations in brief
  • Functions - case and nvl
  • Row level transformations
  • Joining data from multiple tables
  • Group by and aggregations
  • Sorting the data
  • Set operations - union and union all
  • Analytics functions - aggregations
  • Analytics functions - ranking
  • Windowing functions
  • Creating Data Frames and register as temp tables
  • Write Spark Application - Processing Data using Spark SQL
  • Write Spark Application - Saving Data Frame to Hive tables
  • Data Frame Operations
Exercises or Problem Statements with Solutions
  • Introduction about exercises
  • General Guidelines about Exercises or Problem Statements
  • General Guidelines - Initializing the Job
  • Getting crime count per type per month - Understanding Data
  • Getting crime count per type per month - Implementing the logic - Core API
  • Getting crime count per type per month - Implementing the logic - Data Frames
  • Getting crime count per type per month - Validating Output
  • Get inactive customers - using Core Spark API (leftOuterJoin)
  • Get inactive customers - using Data Frames and SQL
  • Get top 3 crimes in RESIDENCE - using Core Spark API
  • Get top 3 crimes in RESIDENCE - using Data Frame and SQL
  • Convert NYSE data from text file format to parquet file format
  • Get word count - with custom control arguments, num keys and file format