Apache Spark Certification Training

Apache Spark Certification Training

English | MP4 | AVC 1920×1080 | AAC 44KHz 2ch | 99 Lessons (15h 13m) | 3.96 GB

Apache Spark is a core data skill – here is how to show you got it!
Learn Apache Spark from the ground up and show off your knowledge with the Databricks Associate Developer for Apache Spark certification. This course will transform you into a PySpark professional and get you ready to pass the popular Databricks Spark certification.

Join me for an easy to understand and engaging look into Spark and take your big data career to the next level!

What will you learn?

The goal of this course is to teach you fundamental PySpark skills and prepare you to get certified with the Databricks Certified Associate Developer for Apache Spark certification.

The course includes 18 modules to help you understand how Apache Spark works internally and how to use it in practice. You can find all topics covered below, but here is an overview:

  • Become a seasoned expert at coding with Spark DataFrames
  • Get confident with the Databricks certification exam content
  • Discover Spark’s distributed, fault-tolerant data processing
  • Master how to work with Spark in Databricks
  • Understand the Spark cluster architecture
  • Learn when and how Spark evaluates code
  • Grasp Spark’s efficient memory management mechanisms
  • Analyze typical Spark problems like out-of-memory errors
  • See how Spark executes complex operations like joins
  • Become proficient in navigating through the Spark UI
  • …and many more topics – check out the full list below!

Who is this for?

Anyone with basic Python skills who wants to develop their big data processing skills! And anyone who would like to pass the popular Databricks Certified Associate Developer for Apache Spark certification using PySpark.

If you want to learn how to use Apache Spark with the Scala programming language, this course isn’t a fit. We focus on Python and PySpark exclusively, but the fundamental Spark concepts taught are applicable to both languages.

  • Data analysts and developers who want to add verified big data skills and Databricks experience to their portfolio
  • Data engineers who want or need a proof of their Apache Spark skills via a certification to boost their career
  • Data scientists wanting to work efficiently and frustration-free with large data sets in Apache Spark
  • Companies who want to enable their data staff to use Apache Spark in a professional, time- and cost-efficient way
  • Anyone wanting to brush up their Apache Spark skills with a solid understanding of how it works under the hood
Table of Contents

1 Introduction
2 Certification Exam Overview
3 Signing up for Databricks Community Edition
4 Loading Data Into Databricks
5 Overview of the Spark Cluster Architecture and its Components
6 Getting to Know the Spark Driver
7 Getting to Know Executors
8 Discovering Execution Modes
9 Overview
10 Internal Types, DataFrames, Datasets, RDDs, and the Spark SQL API
11 Hands-on Session_ Exploring Data APIs on Databricks Community Edition
12 Intro to Labs
13 Intro & Creating DataFrames
14 Exercise_ Creating a DataFrame
15 Exercise_ Creating a DataFrame – Solution
16 Working with Schemas
17 Exercise_ Building a Simple Schema
18 Exercise_ Building a Simple Schema – Solution
19 Exercise_ Building a Complex Schema
20 Exercise_ Building a Complex Schema – Solution
21 Type Conversion of DataFrame Columns
22 Exercise_ Changing the Type of a Column
23 Exercise_ Changing the Type of a Column – Solution
24 Overview
25 Shuffles
26 Data Skew
27 Spark Configurations for Partitions
28 Hands-on Session_ The Power of Partitions
29 Storage Layout
30 Caching and Storage Levels
31 Memory in Action
32 Hands-on Session_ Executor Memory Management – Part 1
33 Hands-on Session_ Executor Memory Management – Part 2
34 Intro & How to Get Help in PySpark
35 Partitioning Recap
36 Exercise_ Repartitioning
37 Exercise_ Repartitioning – Solution
38 Caching Recap
39 Exercise_ Caching
40 Exercise_ Caching – Solution
41 Overview
42 Hands-On Session_ Actions vs. Transformations
43 Intro & Reading Data
44 Exercise_ Reading Parquet Files
45 Exercise_ Reading Parquet Files – Solution
46 Reading from CSV Files
47 Exercise_ Reading CSV Files
48 Exercise_ Reading CSV Files – Solution
49 Reading from JSON Files
50 Writing Data
51 Exercise_ Writing to Parquet Files
52 Exercise_ Writing to Parquet Files – Solution
53 Writing to CSV Files
54 Exercise_ Writing to CSV Files
55 Exercise_ Writing to CSV Files – Solution
56 Writing to JSON Files
57 Using PySpark with SQL
58 Exercise_ SQL in PySpark
59 Exercise_ SQL in PySpark – Solution
60 Overview
61 Hands-on Session_ Discovering the Spark UI
62 Intro & Removing Data
63 Exercise_ Removing Data
64 Exercise_ Removing Data – Solution
65 Modifying Data
66 Exercise_ Modifying Data
67 Exercise_ Modifying Data – Solution
68 Analyzing Data
69 Exercise_ Analyzing Data
70 Exercise_ Analyzing Data – Solution
71 The Catalyst Optimizer
72 Adaptive Query Execution
73 Dynamic Partition Pruning
74 The DAG_ Achieving Fault Tolerance
75 Intro & Working With Dates and Times
76 Exercise_ Working With Dates and Times
77 Exercise_ Working With Dates and Times – Solution
78 Working With Strings
79 Exercise_ Working With Strings
80 Exercise_ Working With Strings – Solution
81 Working with Arrays
82 Exercise_ Working With Arrays
83 Exercise_ Working With Arrays – Solution
84 Accumulator and Broadcast Variables
85 Joins
86 Hands-on Session_ Cross-Cluster Communication
87 Intro & Grouping and Aggregating
88 Exercise_ Grouping and Aggregating
89 Exercise_ Grouping and Aggregating – Solution
90 Joining
91 Exercise_ Joining
92 Exercise_ Joining – Solution
93 User-Defined Functions (UDFs)
94 Exercise_ UDFs
95 Exercise_ UDFs – Solution
96 Signing up for the Exam
97 Last Minute Preparations
98 Introduction
99 Congratulations!

Homepage