Introduction
  • What is Covered in this Class
  • Big Picture - Data Scrubbing
  • Optional - Getting RStudio
Reading Data
  • Common Data Readers
Data Transformation - Data Scrubbing
  • Dates - Reading and Casting Dates
  • Text Data - Ways to Quantify Free-Form Text
  • Text Data - Categories
  • Text Data - Categories 2 & Pipeline Check
  • Imputing Data - Dealing with Missing Data
  • Pipeline Check
  • Caret Library - nearZeroVar
Feature Engineering
  • Engineering Dates - Getting Additional Features out of Dates
  • Numerical Engineering - Integers and Real Numbers
  • Pipeline Check
Basic Data Exploration
  • Correlations
  • Caret Library - findCorrelation
  • Hunting Outliers
Modeling
  • Random Forest - Titanic Data Set
  • GBM (Generalized Boosted Models)/Caret - Diabetes Data Set - 1
  • GBM - 2
  • K-means, Unstructured Modeling