Reading the report; leakage and cross-validation
classification_report prints precision, recall and F1 per class — exactly the metrics you now understand from the inside:
from sklearn.metrics import classification_report
print(classification_report(y_test, pred))
The lesson from the evaluation topic carries straight over: when the positive class is rare, do not be reassured by a high accuracy. Read the precision and recall, and judge them against what the problem actually needs.
The beauty of the shared interface is that swapping the algorithm changes one line. But the experiment is only trustworthy if the evaluation is, and two things undermine it.
A single split is itself a little noisy, so for a steadier estimate you use cross-validation (several different splits, averaged). And data leakage — letting any information from the test set sneak into training, including fitting a scaler on the full data before splitting — silently inflates your scores and is the most common reason a model that looked great in a notebook disappoints in production. The humbling lesson: honest metrics plus a clean split is the game; the choice of algorithm is often the least important decision.
and leads into “Measure first, then vectorise”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users