- Introduction
- Using itversity platforms - Big Data Developer labs and forum
- Setting up Scala
- Basic Programming Constructs
- Setup Scala on Windows
- Functions
- Object Oriented Concepts - Class
- Object Oriented Concepts - Object
- Object Oriented Concepts - Case Classes
- Collections - Seq, Set and Map
- Basic Map Reduce Operations
- Setting up Data Sets for Basic I/O Operations
- Basic I/O Operations and using Scala Collections APIs
- Tuples
- Development Cycle - Developing Source code
- Development Cycle - Compile source code to jar using SBT
- Development Cycle - Setup SBT on Windows
- Development Cycle - Compile changes and run jar with arguments
- Development Cycle - Setup IntelliJ with Scala
- Development Cycle - Develop Scala application using SBT in IntelliJ
- Introduction
- Setup Options
- Setup using tar ball
- Setup using Hortonworks Sandbox
- Using labs.itversity.com
- Using Windows - Putty and WinSCP
- Using Windows - Cygwin
- HDFS - Quick Preview
- YARN - Quick Preview
- Setup Data Sets
- Curriculum
- Introduction
- Setup Spark on Windows
- Problem Statement and Environment
- Initialize the Job
- Resilient Distributed Data Sets
- Previewing the Data
- Filtering the Data
- Accumulators
- Converting to Key Value Pairs - using map
- Joining Data Sets
- Get Daily Revenue by Product - reduceByKey
- Get Daily Revenue and count by Product - aggregateByKey
- Execution Life Cycle
- Broadcast Variables
- Sorting the Data - By Date in Ascending order and revenue in Descending Order
- Saving data back to HDFS
- Add spark dependencies to sbt
- Develop as Scala based application
- Run locally using spark-submit
- Ship and run it on big data cluster
- Introduction to Spark SQL and Objectives
- Different interfaces to run SQL - Hive, Spark SQL
- Create database and tables of text file format - orders and order_items
- Create database and tables of ORC file format - orders and order_items
- Running queries using Scala - spark-shell
- Functions - Getting Started
- Functions - String Manipulation
- Functions - Date Manipulation
- Functions - Aggregations in brief
- Functions - case and nvl
- Row level transformations
- Joining data from multiple tables
- Group by and aggregations
- Sorting the data
- Set operations - union and union all
- Analytics functions - aggregations
- Analytics functions - ranking
- Windowing functions
- Creating Data Frames and register as temp tables
- Write Spark Application - Processing Data using Spark SQL
- Write Spark Application - Saving Data Frame to Hive tables
- Data Frame Operations
- Introduction about exercises
- General Guidelines about Exercises or Problem Statements
- General Guidelines - Initializing the Job
- Getting crime count per type per month - Understanding Data
- Getting crime count per type per month - Implementing the logic - Core API
- Getting crime count per type per month - Implementing the logic - Data Frames
- Getting crime count per type per month - Validating Output
- Get inactive customers - using Core Spark API (leftOuterJoin)
- Get inactive customers - using Data Frames and SQL
- Get top 3 crimes in RESIDENCE - using Core Spark API
- Get top 3 crimes in RESIDENCE - using Data Frame and SQL
- Convert NYSE data from text file format to parquet file format
- Get word count - with custom control arguments, num keys and file format