The capstone and the full pipeline
Advanced
Python for Data Science
Created by Best
· 24.06.2026 at 14:03 UTC
The capstone is not a new topic — it's every topic, used at once. You take a dataset from a raw file all the way to a defended conclusion, and you make the work reproducible by someone who has never met you.
It's the artefact you can show an employer or collaborator, and the proof that you can choose the right tool for each step rather than just recognising tools in isolation. What distinguishes it from a notebook that merely "gets a number" is not the sophistication of the model, but that the result is reproducible and honestly evaluated.
The project runs the full pipeline you've assembled piece by piece:
- Load and validate the data — parse each column into the right type, handle missing values deliberately.
- Explore it — plot distributions and relationships before modelling.
- Engineer features — small, named, typed transforms.
- Model it — a clean train/test split and scikit-learn's estimator API.
- Evaluate honestly — the right metric, scored on held-out data, reported with its uncertainty, never just bare accuracy.
- Write it up — question, method, results and their limitations, and what you'd do next.
seeds” and leads into “What you hand in and what good looks like”.*
University approvals: 0
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Advanced
- Completed: 0 users
Creator
Best
BestBuddy