Reading the report; leakage and cross-validation

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

classification_report prints precision, recall and F1 per class — exactly the metrics you now understand from the inside:

from sklearn.metrics import classification_report
print(classification_report(y_test, pred))

The lesson from the evaluation topic carries straight over: when the positive class is rare, do not be reassured by a high accuracy. Read the precision and recall, and judge them against what the problem actually needs.

The beauty of the shared interface is that swapping the algorithm changes one line. But the experiment is only trustworthy if the evaluation is, and two things undermine it.

A single split is itself a little noisy, so for a steadier estimate you use cross-validation (several different splits, averaged). And data leakage — letting any information from the test set sneak into training, including fitting a scaler on the full data before splitting — silently inflates your scores and is the most common reason a model that looked great in a notebook disappoints in production. The humbling lesson: honest metrics plus a clean split is the game; the choice of algorithm is often the least important decision.
and leads into “Measure first, then vectorise”.*

University approvals: 0
Related cards
Builds on The estimator API and the train/test split · Python for Data Science
Next Measure first, then vectorise · Python for Data Science
Tasks
Question 1

A model reports 98% accuracy on a dataset where 97% of cases are negative. What should you look at before trusting it?

Question 2

What is data leakage in machine learning?

Question 3

Select every action that risks data leakage.

Select all that apply.
Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy