Introduction
  • Introduction
  • Course curriculum overview
  • Course requirements
  • How to approach this course
  • Setting up your computer
  • Course Material
  • Download Jupyter notebooks
  • Download datasets
  • Download course presentations
  • Moving Forward
  • FAQ: Data Science, Python programming, datasets, presentations and more...
Variable Types
  • Variables | Intro
  • Numerical variables
  • Categorical variables
  • Date and time variables
  • Mixed variables
  • Quiz about variable types
Variable Characteristics
  • Variable characteristics
  • Missing data
  • Cardinality - categorical variables
  • Rare Labels - categorical variables
  • Linear models assumptions
  • Linear model assumptions - additional reading resources (optional)
  • Variable distribution
  • Outliers
  • Variable magnitude
  • Bonus: Machine learning algorithms overview
  • Bonus: Additional reading resources
Missing Data Imputation
  • Introduction to missing data imputation
  • Complete Case Analysis
  • Mean or median imputation
  • Arbitrary value imputation
  • End of distribution imputation
  • Frequent category imputation
  • Missing category imputation
  • Random sample imputation
  • Adding a missing indicator
  • Mean or median imputation with Scikit-learn
  • Arbitrary value imputation with Scikit-learn
  • Frequent category imputation with Scikit-learn
  • Missing category imputation with Scikit-learn
  • Adding a missing indicator with Scikit-learn
  • Automatic determination of imputation method with Sklearn
  • Introduction to Feature-engine
  • Mean or median imputation with Feature-engine
  • Arbitrary value imputation with Feature-engine
  • End of distribution imputation with Feature-engine
  • Frequent category imputation with Feature-engine
  • Missing category imputation with Feature-engine
  • Random sample imputation with Feature-engine
  • Adding a missing indicator with Feature-engine
  • Overview of missing value imputation methods
  • Conclusion: when to use each missing data imputation method
Multivariate Missing Data Imputation
  • Multivariate Imputation
  • KNN Impute
  • KNN Impute - Demo
  • MICE
  • missForest
  • MICE and missForest - Demo
  • Additional Reading resources (Optional)
Categorical Variable Encoding
  • Categorical encoding | Introduction
  • One hot encoding
  • Important: Feature-engine version 1.0.0
  • One-hot-encoding: Demo
  • One hot encoding of top categories
  • One hot encoding of top categories | Demo
  • Ordinal encoding | Label encoding
  • Ordinal encoding | Demo
  • Count or frequency encoding
  • Count encoding | Demo
  • Target guided ordinal encoding
  • Target guided ordinal encoding | Demo
  • Mean encoding
  • Mean encoding | Demo
  • Probability ratio encoding
  • Weight of evidence (WoE)
  • Weight of Evidence | Demo
  • Comparison of categorical variable encoding
  • Rare label encoding
  • Rare label encoding | Demo
  • Binary encoding and feature hashing
  • Summary table of encoding techniques
  • Bonus: Additional reading resources
Variable Transformation
  • Variable Transformation | Introduction
  • Variable Transformation with Numpy and SciPy
  • variable Transformation with Scikit-learn
  • Variable transformation with Feature-engine
Discretisation
  • Discretisation | Introduction
  • Equal-width discretisation
  • Important: Feature-engine v 1.0.0
  • Equal-width discretisation | Demo
  • Equal-frequency discretisation