Counting matches with masks; &, |, ~ and np.where
Combine masks and you can count the building blocks of every classification metric. With labels holding the truth (1 = the class we care about):
scores = np.array([0.9, 0.6, 0.2, 0.1])
labels = np.array([1, 0, 1, 0])
pred = scores >= 0.5
tp = np.sum(pred & (labels == 1)) # predicted yes AND truly yes
fp = np.sum(pred & (labels == 0)) # predicted yes BUT truly no
fn = np.sum(~pred & (labels == 1)) # predicted no BUT truly yes
np.sum over a boolean array counts the True values. Those three counts — true positives, false positives, false negatives — are the whole confusion matrix, built from masks.
There is one rule you must not break: combine boolean arrays with & (and), | (or), ~ (not) — not Python's and / or / not. The keywords try to squeeze a whole array into a single true-or-false and raise "the truth value of an array is ambiguous"; the symbol operators work element by element.
Always wrap each comparison in parentheses, because & binds more tightly than >= and ==:
mask = (scores >= 0.5) & (labels == 1) # correct
# scores >= 0.5 & labels == 1 # WRONG: parsed as scores >= (0.5 & labels) == 1
One more caution for later: a single NaN (a missing float) poisons an ordinary np.mean, so reach for np.nanmean when missing values are in play.
Because True counts as 1 and False as 0, np.sum over a mask tells you how many elements satisfy a condition:
np.sum(scores >= 0.5) # 2 -> two elements are at least 0.5
np.mean(scores >= 0.5) # 0.5 -> half of them (a rate!)
And np.where(cond, a, b) chooses element by element — a where the condition is true, b where it's false:
np.where(scores >= 0.5, 1, 0) # array([1, 1, 0, 0])
These are the everyday tools for turning conditions into counts, rates, and decisions, all without an explicit loop.
into “DataFrames: reading, filtering, adding columns”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users