Data Science in Python: Classification Modeling

Data Science in Python: Classification Modeling

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 170 lectures (9h 51m) | 3.49 GB

Learn Python for Data Science & Supervised Machine Learning, and build classification models with fun, hands-on projects

This is a hands-on, project-based course designed to help you master the foundations for classification modeling in Python.

We’ll start by reviewing the data science workflow, discussing the primary goals & types of classification algorithms, and do a deep dive into the classification modeling steps we’ll be using throughout the course.

You’ll learn to perform exploratory data analysis, leverage feature engineering techniques like scaling, dummy variables, and binning, and prepare data for modeling by splitting it into train, test, and validation datasets.

From there, we’ll fit K-Nearest Neighbors & Logistic Regression models, and build an intuition for interpreting their coefficients and evaluating their performance using tools like confusion matrices and metrics like accuracy, precision, and recall. We’ll also cover techniques for modeling imbalanced data, including threshold tuning, sampling methods like oversampling & SMOTE, and adjusting class weights in the model cost function.

Throughout the course, you’ll play the role of Data Scientist for the risk management department at Maven National Bank. Using the skills you learn throughout the course, you’ll use Python to explore their data and build classification models to accurately determine which customers have high, medium, and low credit risk based on their profiles.

Last but not least, you’ll learn to build and evaluate decision tree models for classification. You’ll fit, visualize, and fine-tune these models using Python, then apply your knowledge to more advanced ensemble models like random forests and gradient boosted machines.

COURSE OUTLINE:

Intro to Data Science
Introduce the fields of data science and machine learning, review essential skills, and introduce each phase of the data science workflow

Classification 101
Review the basics of classification, including key terms, the types and goals of classification modeling, and the modeling workflow

Pre-Modeling Data Prep & EDA
Recap the data prep & EDA steps required to perform modeling, including key techniques to explore the target, features, and their relationships

K-Nearest Neighbors
Learn how the k-nearest neighbors (KNN) algorithm classifies data points and practice building KNN models in Python

Logistic Regression
Introduce logistic regression, learn the math behind the model, and practice fitting them and tuning regularization strength

Classification Metrics
Learn how and when to use several important metrics for evaluating classification models, such as precision, recall, F1 score, and ROC-AUC

Imbalanced Data
Understand the challenges of modeling imbalanced data and learn strategies for improving model performance in these scenarios

Decision Trees
Build and evaluate decision tree models, algorithms that look for the splits in your data that best separate your classes

Ensemble Models
Get familiar with the basics of ensemble models, then dive into specific models like random forests and gradient boosted machines

What you’ll learn

  • Master the foundations of supervised Machine Learning & classification modeling in Python
  • Perform exploratory data analysis on model features and targets
  • Apply feature engineering techniques and split the data into training, test and validation sets
  • Build and interpret k-nearest neighbors and logistic regression models using scikit-learn
  • Evaluate model performance using tools like confusion matrices and metrics like accuracy, precision, recall, and F1
  • Learn techniques for modeling imbalanced data, including threshold tuning, sampling methods, and adjusting class weights
  • Build, tune, and evaluate decision tree models for classification, including advanced ensemble models like random forests and gradient boosted machines
Table of Contents

Introduction
1 Course Introduction
2 About This Series
3 Course Structure & Outline
4 Course Structure & Outline
5 READ ME Important Notes for New Students
6 DOWNLOAD Course Resources
7 Introducing the Course Project
8 Setting Expectations
9 Jupyter Installation & Launch

Intro to Data Science
10 What is Data Science
11 The Data Science Skillset
12 What is Machine Learning
13 Common Machine Learning Algorithms
14 Data Science Workflow
15 Data Prep & EDA Steps
16 Modeling Steps
17 Classification Modeling
18 Key Takeaways
19 Intro to Data Science

Classification 101
20 Classification 101
21 Goals of Classification
22 Types of Classification
23 Classification Modeling Workflow
24 Key Takeaways
25 Classification 101

Data Prep & EDA
26 EDA For Classification
27 Defining a Target
28 DEMO Defining a Target
29 Exploring the Target
30 Exploring the Features
31 DEMO Exploring the Features
32 ASSIGNMENT Exploring the Target & Features
33 SOLUTION Exploring the Target & Features
34 Correlation
35 PRO TIP Correlation Matrix
36 DEMO Correlation Matrix
37 Feature-Target Relationships
38 Feature-Feature Relationships
39 PRO TIP Pair Plots
40 ASSIGNMENT Exploring Relationships
41 SOLUTION Exploring Relationships
42 Feature Engineering Overview
43 Numeric Feature Engineering
44 Dummy Variables
45 Binning Categories
46 DEMO Feature Engineering
47 Data Splitting
48 Preparing Data for Modeling
49 ASSIGNMENT Preparing the Data for Modeling
50 SOLUTION Prepare the Data for Modeling
51 Key Takeaways
52 Data Prep & EDA

K-Nearest Neighbors
53 K-Nearest Neighbors
54 The KNN Workflow
55 KNN in Python
56 Model Accuracy
57 Confusion Matrix
58 DEMO Confusion Matrix
59 ASSIGNMENT Fitting a Simple KNN Model
60 SOLUTION Fitting a Simple KNN Model
61 Hyperparameter Tuning
62 Overfitting & Validation
63 DEMO Hyperparameter Tuning
64 Hard vs. Soft Classification
65 DEMO Probability vs. Event Rate
66 ASSIGNMENT Tuning a KNN Model
67 SOLUTION Tuning a KNN Model
68 Pros & Cons of KNN
69 Key Takeaways
70 K-Nearest Neighbors

Logistic Regression
71 Logistic Regression
72 Logistic vs. Linear Regression
73 The Logistic Function
74 Likelihood
75 Multiple Logistic Regression
76 The Logistic Regression Workflow
77 Logistic Regression in Python
78 Interpreting Coefficients
79 ASSIGNMENT Logistic Regression
80 SOLUTION Logistic Regression
81 Feature Engineering & Selection
82 Regularization
83 Tuning a Regularized Model
84 DEMO Regularized Logistic Regression
85 ASSIGNMENT Regularized Logistic Regression
86 SOLUTION Regularized Logistic Regression
87 Multi-class Logistic Regression
88 ASSIGNMENT Multi-class Logistic Regression
89 SOLUTION Multi-class Logistic Regression
90 Pros & Cons of Logistic Regression
91 Key Takeaways
92 Logistic Regression

Classification Metrics
93 Classification Metrics
94 Accuracy, Precision & Recall
95 DEMO Accuracy, Precision & Recall
96 PRO TIP F1 Score
97 ASSIGNMENT Model Metrics
98 SOLUTION Model Metrics
99 Soft Classification
100 DEMO Leveraging Soft Classification
101 PRO TIP Precision-Recall & F1 Curves
102 DEMO Plotting Precision-Recall & F1 Curves
103 The ROC Curve & AUC
104 DEMO The ROC Curve & AUC
105 Classification Metrics Recap
106 ASSIGNMENT Threshold Shifting
107 SOLUTION Threshold Shifting
108 Multi-class Metrics
109 Multi-class Metrics in Python
110 ASSIGNMENT Multi-class Metrics
111 SOLUTION Multi-class Metrics
112 Key Takeaways
113 Classification Metrics

Imbalanced Data
114 Imbalanced Data
115 Managing Imbalanced Data
116 Threshold Shifting
117 Sampling Strategies
118 Oversampling
119 Oversampling in Python
120 DEMO Oversampling
121 SMOTE
122 SMOTE in Python
123 Undersampling
124 Undersampling in Python
125 ASSIGNMENT Sampling Methods
126 SOLUTION Sampling Methods
127 Changing Class Weights
128 DEMO Changing Class Weights
129 ASSIGNMENT Changing Class Weights
130 SOLUTION Changing Class Weights
131 Imbalanced Data Recap
132 Key Takeaways
133 Imbalanced Data

Mid-Course Project
134 Project Brief
135 Solution Walkthrough

Decision Trees
136 Decision Trees
137 Entropy
138 Decision Tree Predictions
139 Decision Trees in Python
140 DEMO Decision Trees
141 Feature Importance
142 ASSIGNMENT Decision Trees
143 SOLUTION Decision Trees
144 Hyperparameter Tuning for Decision Trees
145 DEMO Hyperparameter Tuning
146 ASSIGNMENT Tuned Decision Tree
147 SOLUTION Tuned Decision Tree
148 Pros & Cons of Decision Trees
149 Key Takeaways
150 Decision Trees

Ensemble Models
151 Ensemble Models
152 Simple Ensemble Models
153 DEMO Simple Ensemble Models
154 ASSIGNMENT Simple Ensemble Models
155 SOLUTION Simple Ensemble Models
156 Random Forests
157 Fitting Random Forests in Python
158 Hyperparameter Tuning for Random Forests
159 PRO TIP Random Search
160 Pros & Cons of Random Forests
161 ASSIGNMENT Random Forests
162 SOLUTION Random Forests
163 Gradient Boosting
164 Gradient Boosting in Python
165 Hyperparameter Tuning for Gradient Boosting
166 DEMO Hyperparameter Tuning for Gradient Boosting
167 Pros & Cons of Gradient Boosting
168 ASSIGNMENT Gradient Boosting
169 SOLUTION Gradient Boosting
170 PRO TIP SHAP Values
171 DEMO SHAP Values
172 Key Takeaways
173 DEMO Ensemble Models
174 Ensemble Models

Classification Summary
175 Recap Classification Models & Workflow
176 Pros & Cons of Classification Models
177 DEMO Production Pipeline & Deployment
178 Looking Ahead Unsupervised Learning

Final Project
179 Project Brief
180 Solution Walkthrough

Next Steps
181 BONUS LESSON

Homepage