Introduction
  • Introduction
  • How is this course structured
A Scenario To Get Us Started
  • Introduction to our development environment
  • Introduction to our dataset & dataframes
  • Latest Config Code
  • Environment configuration code (latest code in downloadable file)
  • Ingesting & Cleaning Data
  • Answering our scenario questions
Core Concepts
  • Bringing data into dataframes
  • Inspecting A Dataframe
  • Handling Null & Duplicate Values
  • Selecting & Filtering Data
  • Applying Multiple Filters
  • Running SQL on Dataframes
  • Adding Calculated Columns
  • Group By And Aggregation
  • Writing Dataframe To Files
Challenge
  • Challenge Overview
  • Challenge Solution
Conclusion
  • Thanks for joining me to learn PySpark!