What you hand in and what good looks like

Advanced Python for Data Science

Created by Best · 24.06.2026 at 14:03 UTC

You hand in four things:

the data (checked in or a pinned snapshot, with its source and licence documented — no live-only API without an offline mirror);
the environment (a pinned requirements.txt, ideally a Dockerfile, and a fixed seed);
the code (a readable pipeline of typed helpers, no copy-paste, sensible structure);
the report that interprets the numbers instead of merely printing them.

The report is where honesty shows: if 96% of your cases are the negative class, reporting "97% accuracy" without precision, recall, and a baseline is misleading, because 96% is reachable by always predicting the majority.

The grading reflects what actually makes data science trustworthy. Does it run from a clean checkout? Are the data's types and missing values handled on purpose? Is the evaluation honest — held-out data, the right metric, uncertainty acknowledged? Is the code readable? Does the report interpret rather than just report?

Notice what's not on that list: the sophistication of the model. A capstone that uses a simple model but is reproducible and honestly evaluated beats a fancy one nobody can re-run or believe. That's the real lesson of the whole course: fluency is choosing the right phrasing for the question in front of you, and showing your work so others can trust it.

University approvals: 0

Related cards

Builds on The capstone and the full pipeline · Python for Data Science

Tasks

Question 1

Your capstone reports 'accuracy 97%' on a dataset where 96% of cases are the negative class. What should the report add to be honest?

Nothing — 97% is excellent

Precision/recall (or per-class metrics) and a baseline, since 96% accuracy is achievable by always predicting the majority class

A larger font

The training accuracy instead

Question 2

What does the capstone grade depend on most?

using the most advanced model available

reproducibility and honest evaluation -- it runs from a clean checkout and is judged with the right metric and stated uncertainty

the number of lines of code

avoiding all libraries

Card Info

Topic: Python for Data Science
Difficulty: Advanced
Completed: 0 users

Creator

Best

BestBuddy