English | MP4 | AVC 1280×720 | AAC 44KHz 2ch | 128 Lessons (18h 11m) | 1.34 GB
Learn data science with Python by building five real-world projects! Experiment with card game predictions, tracking disease outbreaks, and more, as you build a flexible and intuitive understanding of data science.
In Data Science Bookcamp you will find:
- Techniques for computing and plotting probabilities
- Statistical analysis using Scipy
- How to organize datasets with clustering algorithms
- How to visualize complex multi-variable datasets
- How to train a decision tree machine learning algorithm
In Data Science Bookcamp you’ll test and build your knowledge of Python with the kind of open-ended problems that professional data scientists work on every day. Downloadable data sets and thoroughly-explained solutions help you lock in what you’ve learned, building your confidence and making you ready for an exciting new data science career.
A data science project has a lot of moving parts, and it takes practice and skill to get all the code, algorithms, datasets, formats, and visualizations working together harmoniously. This unique book guides you through five realistic projects, including tracking disease outbreaks from news headlines, analyzing social networks, and finding relevant patterns in ad click data.
Data Science Bookcamp doesn’t stop with surface-level theory and toy examples. As you work through each project, you’ll learn how to troubleshoot common problems like missing data, messy data, and algorithms that don’t quite fit the model you’re building. You’ll appreciate the detailed setup instructions and the fully explained solutions that highlight common failure points. In the end, you’ll be confident in your skills because you can see the results.
Valuable and accessible… a solid foundation for anyone aspiring to be a data scientist.
Amaresh Rajasekharan, IBM Corporation
Table of Contents
1 Case study 1 – Finding the winning strategy in a card game
2 Computing probabilities using Python This section covers
3 Problem 2 – Analyzing multiple die rolls
4 Plotting probabilities using Matplotlib
5 Comparing multiple coin-flip probability distributions
6 Running random simulations in NumPy
7 Computing confidence intervals using histograms and NumPy arrays
8 Deriving probabilities from histograms
9 Computing histograms in NumPy
10 Using permutations to shuffle cards
11 Case study 1 solution
12 Optimizing strategies using the sample space for a 10-card deck
13 Case study 2 – Assessing online ad clicks for significance
14 Basic probability and statistical analysis using SciPy
15 Mean as a measure of centrality
16 Variance as a measure of dispersion
17 Making predictions using the central limit theorem and SciPy
18 Comparing two sampled normal curves
19 Determining the mean and variance of a population through random sampling
20 Computing the area beneath a normal curve
21 Statistical hypothesis testing
22 Assessing the divergence between sample mean and population mean
23 Data dredging – Coming to false conclusions through oversampling
24 Bootstrapping with replacement – Testing a hypothesis when the population variance is unknown 1
25 Bootstrapping with replacement – Testing a hypothesis when the population variance is unknown 2
26 Permutation testing – Comparing means of samples when the population parameters are unknown
27 Analyzing tables using Pandas
28 Retrieving table rows
29 Saving and loading table data
30 Case study 2 solution
31 Determining statistical significance
32 Case study 3 – Tracking disease outbreaks using news headlines
33 Clustering data into groups
34 K-means – A clustering algorithm for grouping data into K central groups
35 Using density to discover clusters
36 Clustering based on non-Euclidean distance
37 Analyzing clusters using Pandas
38 Geographic location visualization and analysis
39 Plotting maps using Cartopy
40 Visualizing maps
41 Location tracking using GeoNamesCache
42 Limitations of the GeoNamesCache library
43 Case study 3 solution
44 Visualizing and clustering the extracted location data
45 Case study 4 – Using online job postings to improve your data science resume
46 Measuring text similarities
47 Simple text comparison
48 Replacing words with numeric values
49 Vectorizing texts using word counts
50 Using normalization to improve TF vector similarity
51 Using unit vector dot products to convert between relevance metrics
52 Basic matrix operations, Part 1
53 Basic matrix operations, Part 2
54 Computational limits of matrix multiplication
55 Dimension reduction of matrix data
56 Reducing dimensions using rotation, Part 1
57 Reducing dimensions using rotation, Part 2
58 Dimension reduction using PCA and scikit-learn
59 Clustering 4D data in two dimensions
60 Limitations of PCA
61 Computing principal components without rotation
62 Extracting eigenvectors using power iteration, Part 1
63 Extracting eigenvectors using power iteration, Part 2
64 Efficient dimension reduction using SVD and scikit-learn
65 NLP analysis of large text datasets
66 Vectorizing documents using scikit-learn
67 Ranking words by both post frequency and count, Part 1
68 Ranking words by both post frequency and count, Part 2
69 Computing similarities across large document datasets
70 Clustering texts by topic, Part 1
71 Clustering texts by topic, Part 2
72 Visualizing text clusters
73 Using subplots to display multiple word clouds, Part 1
74 Using subplots to display multiple word clouds, Part 2
75 Extracting text from web pages
76 The structure of HTML documents
77 Parsing HTML using Beautiful Soup, Part 1
78 Parsing HTML using Beautiful Soup, Part 2
79 Case study 4 solution
80 Exploring the HTML for skill descriptions
81 Filtering jobs by relevance
82 Clustering skills in relevant job postings
83 Investigating the technical skill clusters
84 Exploring clusters at alternative values of K
85 Analyzing the 700 most relevant postings
86 Case study 5 – Predicting future friendships from social network data
87 An introduction to graph theory and network analysis
88 Analyzing web networks using NetworkX, Part 1
89 Analyzing web networks using NetworkX, Part 2
90 Utilizing undirected graphs to optimize the travel time between towns
91 Computing the fastest travel time between nodes, Part 1
92 Computing the fastest travel time between nodes, Part 2
93 Dynamic graph theory techniques for node ranking and social network analysis
94 Computing travel probabilities using matrix multiplication
95 Deriving PageRank centrality from probability theory
96 Computing PageRank centrality using NetworkX
97 Community detection using Markov clustering, Part 1
98 Community detection using Markov clustering, Part 2
99 Uncovering friend groups in social networks
100 Network-driven supervised machine learning
101 The basics of supervised machine learning
102 Measuring predicted label accuracy, Part 1
103 Measuring predicted label accuracy, Part 2
104 Optimizing KNN performance
105 Running a grid search using scikit-learn
106 Limitations of the KNN algorithm
107 Training linear classifiers with logistic regression
108 Training a linear classifier, Part 1
109 Training a linear classifier, Part 2
110 Improving linear classification with logistic regression, Part 1
111 Improving linear classification with logistic regression, Part 2
112 Training linear classifiers using scikit-learn
113 Measuring feature importance with coefficients
114 Training nonlinear classifiers with decision tree techniques
115 Training a nested if else model using two features
116 Deciding which feature to split on
117 Training if else models with more than two features
118 Training decision tree classifiers using scikit-learn
119 Studying cancerous cells using feature importance
120 Improving performance using random forest classification
121 Training random forest classifiers using scikit-learn
122 Case study 5 solution
123 Exploring the experimental observations
124 Training a predictive model using network features, Part 1
125 Training a predictive model using network features, Part 2
126 Adding profile features to the model
127 Optimizing performance across a steady set of features
128 Interpreting the trained model
Resolve the captcha to access the links!