Data Pre-Processing for Data Analytics and Data Science

Data Pre-Processing for Data Analytics and Data Science

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 48 lectures (8h 51m) | 5.12 GB

Pre-Processing for Data Analytics and Data Science

The Data Pre-processing for Data Analytics and Data Science course provides students with a comprehensive understanding of the crucial steps involved in preparing raw data for analysis. Data pre- processing is a fundamental stage in the data science workflow, as it involves transforming, cleaning, and integrating data to ensure its quality and usability for subsequent analysis.

Throughout this course, students will learn various techniques and strategies for handling real-world data, which is often messy, inconsistent, and incomplete. They will gain hands-on experience with popular tools and libraries used for data pre-processing, such as Python and its data manipulation libraries (e.g., Pandas), and explore practical examples to reinforce their learning.

Key topics covered in this course include:

Introduction to Data Pre-processing:

– Understanding the importance of data pre-processing in data analytics and data science

– Overview of the data pre-processing pipeline

– Data Cleaning Techniques:

Identifying and handling missing values:

– Dealing with outliers and noisy data

– Resolving inconsistencies and errors in the data

– Data Transformation:

Feature scaling and normalization:

– Handling categorical variables through encoding techniques

– Dimensionality reduction methods (e.g., Principal Component Analysis)

– Data Integration and Aggregation:

Merging and joining datasets:

– Handling data from multiple sources

– Aggregating data for analysis and visualization

– Handling Text and Time-Series Data:

Text preprocessing techniques (e.g., tokenization, stemming, stop-word removal):

– Time-series data cleaning and feature extraction

– Data Quality Assessment:

Data profiling and exploratory data analysis

– Data quality metrics and assessment techniques

– Best Practices and Tools:

Effective data cleaning and pre- processing strategies:

– Introduction to popular data pre-processing libraries and tools (e.g., Pandas, NumPy)

What you’ll learn

  • Students will get in-depth knowledge of Exploratory Data Analysis & Data Pre-Processing
  • We learn about Data Cleaning & how to handle the data.
  • We will learn about how to handle Duplicate & Missing Data.
  • Finally, we will learn a variety of Outlier Analysis Treatment.
  • We will learn about Features Scaling and Transformation Techniques
Table of Contents

Introduction
1 Introduction about Tutor
2 Agenda and Stages of Analytics
3 What is Diagnoistic Analytics
4 What is Predictive Analytics
5 What is Prescriptive Analytics
6 What is CRISP-ML(Q)

Business Understanding Phase
7 Business Understanding – Define Scope Of Application
8 Business Understanding – Define Sucess Criteria
9 Business Understanding – Use Cases

Data Understanding Phase – Data Types
10 Agenda Data Understanding
11 Introduction to Data Understanding
12 Data Types – Continuous vs Discrete
13 Categorical Data vs Count Data
14 Pratical Data Understanding Using Realtime Examples
15 Scale of Measurement
16 Quantitative Vs Qualitative
17 Structured vs Unstructured Data

Data Understanding Phase – Data Collection
18 What is Data Collection
19 Understanding Primary Data Sources
20 Understanding Secondary Data Sources
21 Understanding Data Collection using Survey
22 Understanding Data Collection using DoE
23 Understanding Possible errors in Data Collection stage
24 Understanding Bias and Fairness

Understanding Basic Statistics
25 Introduction to CRISP-ML(Q) Data Preparation & Agenda
26 What is Probability
27 What is Random Variable
28 Understanding Probability and its Application,Probability Distribution .

Data Preparation Phase – Exploratory Data Analysis (EDA)
29 Understanding Normal Distribution
30 What is Inferencial Statistics
31 Understanding Standard Normal Distribution & Whats is Z Scores
32 Understanding Measures of central tendency ( First moment business decession)
33 Understanding Measures of Dispersion ( Second moment business decision)
34 Understanding Box Plot(Diff B-w Percentile and Quantile and Quartile)
35 Understanding Graphical Techniques-Q-Q-Plot
36 Understanding about Bivariate Scatter Plot

Python Installation and Setup
37 Python Installation
38 Anakonda Installation
39 Understand about Anakonda Navigator, Spyder & Python Libraries
40 Understanding about Jupyter and Google Colab

Data Preparation Phase Data Cleansing- Type Casting
41 Recap of Concepts
42 Understanding Data Cleansing Typecasting
43 Understanding Data Cleansing Typecasting Using Python

Data Preparation Phase Data Cleansing- Handling Duplicates
44 Recap of Concepts
45 Understanding Handling Duplicates
46 Understanding Handling Duplicates using Python

Data Preparation Phase Data Cleansing-Outlier Analysis Treatment
47 Understanding Outlier Analysis Treatment
48 Understanding Outlier Analysis Treatment using Python

Homepage