Introduction
  • CCA 175 Spark and Hadoop Developer - Curriculum
Scala Fundamentals
  • Introduction and Setting up of Scala
  • Setup Scala on Windows
  • Basic Programming Constructs
  • Functions
  • Object Oriented Concepts - Classes
  • Object Oriented Concepts - Objects
  • Object Oriented Concepts - Case Classes
  • Collections - Seq, Set and Map
  • Basic Map Reduce Operations
  • Setting up Data Sets for Basic I/O Operations
  • Basic I/O Operations and using Scala Collections APIs
  • Tuples
  • Development Cycle - Developing Source code
  • Development Cycle - Compile source code to jar using SBT
  • Development Cycle - Setup SBT on Windows
  • Development Cycle - Compile changes and run jar with arguments
  • Development Cycle - Setup IntelliJ with Scala
  • Development Cycle - Develop Scala application using SBT in IntelliJ
Getting Started
  • Introduction and Curriculum
  • Setup Environment - Options
  • Setup Environment - Locally
  • Setup Environment - using Cloudera Quickstart VM
  • Using Windows - Putty and WinSCP
  • Using Windows - Cygwin
  • HDFS Quick Preview
  • YARN Quick Preview
  • Setup Data Sets
Apache Spark 1. 6 - Transform, Stage and Store - Spark
  • Introduction
  • Introduction to Spark
  • Setup Spark on Windows
  • Quick overview about Spark documentation
  • Initializing Spark job using spark-shell
  • Create Resilient Distributed Data Sets (RDD)
  • Previewing data from RDD
  • Reading different file formats - Brief overview using JSON
  • Transformations Overview
  • Manipulating Strings as part of transformations using Scala
  • Row level transformations using map
  • Row level transformations using flatMap
  • Filtering the data
  • Joining data sets - inner join
  • Joining data sets - outer join
  • Aggregations - Getting Started
  • Aggregations - using actions (reduce and countByKey)
  • Aggregations - understanding combiner
  • Aggregations using groupByKey - least preferred API for aggregations
  • Aggregations using reduceByKey
  • Aggregations using aggregateByKey
  • Sorting data using sortByKey
  • Global Ranking - using sortByKey with take and takeOrdered
  • By Key Ranking - Converting (K, V) pairs into (K, Iterable[V]) using groupByKey
  • Get topNPrices using Scala Collections API
  • Get topNPricedProducts using Scala Collections API
  • Get top n products by category using groupByKey, flatMap and Scala function
  • Set Operations - union, intersect, distinct as well as minus
  • Save data in Text Input Format
  • Save data in Text Input Format using Compression
  • Saving data in standard file formats - Overview
  • Revision of Problem Statement and Design the solution
  • Solution - Get Daily Revenue per Product - Launching Spark Shell
  • Solution - Get Daily Revenue per Product - Read and join orders and order_items
  • Solution - Get Daily Revenue per Product - Compute daily revenue per product id
  • Solution - Get Daily Revenue per Product - Read products data and create RDD
  • Solution - Get Daily Revenue per Product - Sort and save to HDFS
  • Solution - Add spark dependencies to sbt
  • Solution - Develop as Scala based application
  • Solution - Run locally using spark-submit
  • Solution - Ship and run it on big data cluster
Setup Hadoop and Spark Environment for Practice
  • Introduction to Setting up Enviroment for Practice
  • Overview of ITVersity Boxes GitHub Repository
  • Creating Virtual Machine
  • Starting HDFS and YARN
  • Gracefully Stopping Virtual Machine
  • Undertanding Datasets provided in Virtual Machine
  • Using GitHub Content for the practice
  • Using Resources for Practice
Spark 2 - Data Processing - Overview
  • Introduction for the module
  • Starting Spark Context
  • Overview of Spark read APIs
  • Previewing Schema and Data
  • Overview of Data Frame APIs
  • Overview of Functions
  • Overview of Spark Write APIs
Spark 2 - Processing Column Data using Pre-defined Functions
  • Introduction to Pre-defined Functions
  • Creating Spark Session Object in Notebook
  • Create Dummy Data Frames for Practice
  • Categories of Functions
  • Using Special Functions - col
  • Using Special Functions - lit
  • String Manipulation Functions - Case Conversion and Length
  • String Manipulation - Extracting data from fixed lengith fields using substring
  • String Manipulation - Extracting data from delimited fields using split