Outlier Detection in Python, Video Edition

Outlier Detection in Python, Video Edition

English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 19h 35m | 3.37 GB

Learn how to identify the unusual, interesting, extreme, or inaccurate parts of your data.

Data scientists have two main tasks: finding patterns in data and finding the exceptions. These outliers are often the most informative parts of data, revealing hidden insights, novel patterns, and potential problems. Outlier Detection in Python is a practical guide to spotting the parts of a dataset that deviate from the norm, even when they’re hidden or intertwined among the expected data points.

In Outlier Detection in Python you’ll learn how to:

  • Use standard Python libraries to identify outliers
  • Select the most appropriate detection methods
  • Combine multiple outlier detection methods for improved results
  • Interpret your results effectively
  • Work with numeric, categorical, time series, and text data

Outlier detection is a vital tool for modern business, whether it’s discovering new products, expanding markets, or flagging fraud and other suspicious activities. This guide presents the core tools for outlier detection, as well as techniques utilizing the Python data stack familiar to data scientists. To get started, you’ll only need a basic understanding of statistics and the Python data ecosystem.

Outliers—values that appear inconsistent with the rest of your data—can be the key to identifying fraud, performing a security audit, spotting bot activity, or just assessing the quality of a dataset. This unique guide introduces the outlier detection tools, techniques, and algorithms you’ll need to find, understand, and respond to the anomalies in your data.

Outlier Detection in Python illustrates the principles and practices of outlier detection with diverse real-world examples including social media, finance, network logs, and other important domains. You’ll explore a comprehensive set of statistical methods and machine learning approaches to identify and interpret the unexpected values in tabular, text, time series, and image data. Along the way, you’ll explore scikit-learn and PyOD, apply key OD algorithms, and add some high value techniques for real world OD scenarios to your toolkit.

What’s Inside

  • Python libraries to identify outliers
  • Combine outlier detection methods
  • Interpret your results
Table of Contents

1 Part 1
2 Chapter 1. Introducing outlier detection
3 Chapter 1. Outlier detection s place in machine learning
4 Chapter 1. Outlier detection in tabular data
5 Chapter 1. Definitions of outliers
6 Chapter 1. Trends in outlier detection
7 Chapter 1. How does this book teach outlier detection
8 Chapter 1. Summary
9 Chapter 2. Simple outlier detection
10 Chapter 2. One-dimensional categorical outliers Rare values
11 Chapter 2. Multidimensional outliers
12 Chapter 2. Rare combinations of categorical values
13 Chapter 2. Rare combinations of numeric values
14 Chapter 2. Noise vs. inliers and outliers
15 Chapter 2. Local and global outliers
16 Chapter 2. Combining the scores of univariate tests
17 Chapter 2. Summary
18 Chapter 3. Machine learning-based outlier detection
19 Chapter 3. Types of algorithms
20 Chapter 3. Types of detectors
21 Chapter 3. Summary
22 Chapter 4. The outlier detection process
23 Chapter 4. Determining the types of outliers we are interested in
24 Chapter 4. Choosing the type of model to be used
25 Chapter 4. Collecting the data
26 Chapter 4. Examining the data
27 Chapter 4. Cleaning the data
28 Chapter 4. Feature selection
29 Chapter 4. Feature engineering
30 Chapter 4. Encoding categorical values
31 Chapter 4. Scaling numeric values
32 Chapter 4. Fitting a set of models and generating predictions
33 Chapter 4. Evaluating the models
34 Chapter 4. Setting up ongoing outlier detection systems
35 Chapter 4. Refitting the models as necessary
36 Chapter 4. Summary
37 Part 2
38 Chapter 5. Outlier detection using scikit-learn
39 Chapter 5. Isolation Forest
40 Chapter 5. LocalOutlierFactor (LOF)
41 Chapter 5. One-class SVM (OCSVM)
42 Chapter 5. Elliptic Envelope
43 Chapter 5. Gaussian mixture models
44 Chapter 5. BallTree and KDTree
45 Chapter 5. Summary
46 Chapter 6. The PyOD library
47 Chapter 6. Histogram-based Outlier Score (HBOS)
48 Chapter 6. Empirical Cumulative Distribution Function (ECOD)
49 Chapter 6. Copula-based outlier detection (COPOD)
50 Chapter 6. Angle-based outlier detection (ABOD)
51 Chapter 6. Clustering-based local outlier factor (CBLOF)
52 Chapter 6. Local correlation integral (LOCI)
53 Chapter 6. Connectivity-based outlier factor (COF)
54 Chapter 6. Principal component analysis (PCA)
55 Chapter 6. Subspace outlier detection
56 Chapter 6. FeatureBagging
57 Chapter 6. Cook s Distance
58 Chapter 6. Using SUOD for faster model training
59 Chapter 6. The PYOD thresholds module
60 Chapter 6. Summary
61 Chapter 7. Additional libraries and algorithms for outlier detection
62 Chapter 7. The alibi-detect library
63 Chapter 7. The PyCaret library
64 Chapter 7. Local outlier probability (LoOP)
65 Chapter 7. Local distance-based outlier factor (LDOF)
66 Chapter 7. Extended Isolation Forest (EIF)
67 Chapter 7. Outlier Detection Using In-degree Number (ODIN)
68 Chapter 7. Clustering
69 Chapter 7. Entropy
70 Chapter 7. Association Rules
71 Chapter 7. Convex Hull
72 Chapter 7. Distance metric learning (DML)
73 Chapter 7. NearestSample
74 Chapter 7. Summary
75 Part 3
76 Chapter 8. Evaluating detectors and parameters
77 Chapter 8. Contour plots
78 Chapter 8. Visualizing subspaces in real-world data
79 Chapter 8. Correlation between detectors with full real-world datasets
80 Chapter 8. Modifying real-world data
81 Chapter 8. Testing with classification datasets
82 Chapter 8. Timing experiments
83 Chapter 8. Summary
84 Chapter 9. Working with specific data types
85 Chapter 9. Special data types
86 Chapter 9. Text features
87 Chapter 9. Encoding categorical data
88 Chapter 9. Scaling numeric values
89 Chapter 9. Binning numeric data
90 Chapter 9. Distance metrics
91 Chapter 9. Summary
92 Chapter 10. Handling very large and very small datasets
93 Chapter 10. Data with many rows
94 Chapter 10. Working with very small datasets
95 Chapter 10. Summary
96 Chapter 11. Synthetic data for outlier detection
97 Chapter 11. Generating new synthetic data
98 Chapter 11. Doping
99 Chapter 11. Simulations
100 Chapter 11. Training classifiers to distinguish real from fake data
101 Chapter 11. Summary
102 Chapter 12. Collective outliers
103 Chapter 12. Preparing the data
104 Chapter 12. Testing for duplicates
105 Chapter 12. Testing for gaps
106 Chapter 12. Testing for missing combinations
107 Chapter 12. Creating new tables to capture collective outliers
108 Chapter 12. Identifying trends
109 Chapter 12. Unusual distributions
110 Chapter 12. Rolling windows features
111 Chapter 12. Tests for unusual numbers of point anomalies
112 Chapter 12. Summary
113 Chapter 13. Explainable outlier detection
114 Chapter 13. Post hoc explanations
115 Chapter 13. Interpretable outlier detectors
116 Chapter 13. Summary
117 Chapter 14. Ensembles of outlier detectors
118 Chapter 14. Accuracy metrics with ensembles
119 Chapter 14. Methods to create ensembles
120 Chapter 14. Selecting detectors for an ensemble
121 Chapter 14. Scaling scores
122 Chapter 14. Combining scores
123 Chapter 14. Summary
124 Chapter 15. Working with outlier detection predictions
125 Chapter 15. Examining the flagged outliers
126 Chapter 15. Automating the process of sorting outlier detection results
127 Chapter 15. Semisupervised learning
128 Chapter 15. Regression testing
129 Chapter 15. Summary
130 Part 4
131 Chapter 16. Deep learning-based outlier detection
132 Chapter 16. PyOD
133 Chapter 16. Image data
134 Chapter 16. alibi-detect
135 Chapter 16. Self-supervised learning for outlier detection with tabular data
136 Chapter 16. Summary
137 Chapter 17. Time-series data
138 Chapter 17. Types of time-series outliers
139 Chapter 17. Tools for time-series data
140 Chapter 17. Summary

Homepage