INTRODUCTION TO THE COURSE: The Key Concepts and Software Tools
  • Introduction
  • Data and Scripts For the Course
  • Introduction to R and RStudio
  • Start with Rattle
  • Troubleshooting For Rattle
  • Conclusion to Section 1
Reading in Data from Different Sources in R
  • Read in Data from CSV and Excel Files
  • Read Data from a Database
  • Read Data from JSON
  • Read in Data from Online CSVs
  • Read in Data from Online HTML Tables-Part 1
  • Read in Data from Online HTML Tables-Part 2
  • Read Data from Other Sources
  • Conclusions to Section 2
Exploratory Data Analysis and Data Visualization in R
  • Remove NAs
  • More Data Cleaning
  • Exploratory Data Analysis(EDA): Basic Visualizations with R
  • More Exploratory Data Analysis with xda
  • Introduction to dplyr for Data Summarizing-Part 1
  • Introduction to dplyr for Data Summarizing-Part 2
  • Data Exploration & Visualization With dplyr & ggplot2
  • Pre-Processing Dates-Part 1
  • Pre-Processing Dates-Part 2
  • Plotting Temporal Data in R
  • Twist in the (Temporal) Data
  • Associations Between Quantitative Variables- Theory
  • Testing for Correlation
  • Evaluate the Relation Between Nominal Variables
  • Cramer's V for Examining the Strength of Association Between Nominal Variable
  • Section 3 Quiz
Data Mining for Patterns and Relationships
  • What is Data Mining?
  • Association Mining with Apriori
  • Apriori with Real Data
  • Visualize the Rules
  • Association Mining with Eclat
  • Eclat with Real Data
Machine Learning for Data Science
  • How is Machine Learning Different from Statistical Data Analysis?
  • What is Machine Learning (ML) About? Some Theoretical Pointers
Unsupervised Classification- R
  • K-means Clustering
  • Fuzzy K-Means Clustering
  • Weighted K-Means Clustering
  • Hierarchical Clustering in R
  • Expectation-Maximization (EM) in R
  • Use Rattle for Unsupervised Clustering
  • Conclusions to Section 6
  • Section 6 Quiz
Dimension Reduction
  • Dimensionality Reduction-theory
  • PCA
  • Removing Highly Correlated Predictor Variables
  • Variable Selection Using LASSO Regression
  • Variable Selection With FSelector
  • Boruta Analysis for Feature Selection
  • Conclusions to Section 7
  • Section 7 Quiz
Supervised Learning Theory
  • Some Basic Supervised Learning Concepts
  • Pre-processing for Supervised Learning
Supervised Learning: Classification
  • Binary Classification
  • What are GLMs?
  • Logistic Regression Models as Binary Classifiers
  • Linear Discriminant Analysis (LDA)
  • Binary Classifier with PCA
  • Obtain Binary Classification Accuracy Metrics
  • Multi-class Classification Models
  • Our Multi-class Classification Problem
  • Classification Trees
  • More on Classification Tree Visualization
  • Decision Trees
  • Random Forest (RF) classification
  • Examine Individual Variable Importance for Random Forests
  • GBM Classification
  • Support Vector Machines (SVM) for Classification
  • More SVM for Classification
  • Conclusions to Section 9
  • Section 9 Quiz
Supervised Learning: Regression
  • Ridge Regression in R
  • LASSO Regression in R
  • Generalized Additive Models (GAMs) in R
  • Boosted GAMs
  • MARS Regression
  • CART-Regression Trees in R
  • Random Forest (RF) Regression
  • GBM Regression
  • Compare Models
  • Conclusions to Section 10
Introduction to Artificial Neural Networks (ANN)
  • What are Artificial Neural Networks?
  • Neural Network for Binary Classifications
  • Neural Network with PCA for Binary Classifications
  • Neural Network for Regression
  • More on Neural Networks- with neuralnet