The capstone and the full pipeline

Advanced Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

The capstone is not a new topic — it's every topic, used at once. You take a dataset from a raw file all the way to a defended conclusion, and you make the work reproducible by someone who has never met you.

It's the artefact you can show an employer or collaborator, and the proof that you can choose the right tool for each step rather than just recognising tools in isolation. What distinguishes it from a notebook that merely "gets a number" is not the sophistication of the model, but that the result is reproducible and honestly evaluated.

The project runs the full pipeline you've assembled piece by piece:

  1. Load and validate the data — parse each column into the right type, handle missing values deliberately.
  2. Explore it — plot distributions and relationships before modelling.
  3. Engineer features — small, named, typed transforms.
  4. Model it — a clean train/test split and scikit-learn's estimator API.
  5. Evaluate honestly — the right metric, scored on held-out data, reported with its uncertainty, never just bare accuracy.
  6. Write it up — question, method, results and their limitations, and what you'd do next.
    seeds” and leads into “What you hand in and what good looks like”.*
University approvals: 0
Related cards
Builds on Reproducibility: environments, containers, seeds · Python for Data Science
Next What you hand in and what good looks like · Python for Data Science
Tasks
Question 1

Which single property most distinguishes a professional DS capstone from a notebook that merely 'gets a number'?

Question 2

In the capstone, where do you measure your model's performance?

Card Info
  • Topic: Python for Data Science
  • Difficulty: Advanced
  • Completed: 0 users
Creator
Best
Best
BestBuddy