Counting matches with masks; &, |, ~ and np.where

Intermediate Python for Data Science

Created by Best · 24.06.2026 at 14:03 UTC

Combine masks and you can count the building blocks of every classification metric. With labels holding the truth (1 = the class we care about):

scores = np.array([0.9, 0.6, 0.2, 0.1])
labels = np.array([1, 0, 1, 0])
pred = scores >= 0.5

tp = np.sum(pred & (labels == 1))   # predicted yes AND truly yes
fp = np.sum(pred & (labels == 0))   # predicted yes BUT truly no
fn = np.sum(~pred & (labels == 1))  # predicted no  BUT truly yes

np.sum over a boolean array counts the True values. Those three counts — true positives, false positives, false negatives — are the whole confusion matrix, built from masks.

There is one rule you must not break: combine boolean arrays with & (and), | (or), ~ (not) — not Python's and / or / not. The keywords try to squeeze a whole array into a single true-or-false and raise "the truth value of an array is ambiguous"; the symbol operators work element by element.

Always wrap each comparison in parentheses, because & binds more tightly than >= and ==:

mask = (scores >= 0.5) & (labels == 1)   # correct
# scores >= 0.5 & labels == 1            # WRONG: parsed as scores >= (0.5 & labels) == 1

One more caution for later: a single NaN (a missing float) poisons an ordinary np.mean, so reach for np.nanmean when missing values are in play.

Because True counts as 1 and False as 0, np.sum over a mask tells you how many elements satisfy a condition:

np.sum(scores >= 0.5)        # 2  -> two elements are at least 0.5
np.mean(scores >= 0.5)       # 0.5 -> half of them (a rate!)

And np.where(cond, a, b) chooses element by element — a where the condition is true, b where it's false:

np.where(scores >= 0.5, 1, 0)   # array([1, 1, 0, 0])

These are the everyday tools for turning conditions into counts, rates, and decisions, all without an explicit loop.
into “DataFrames: reading, filtering, adding columns”.*

University approvals: 0

Related cards

Builds on Broadcasting and boolean masks · Python for Data Science

Next DataFrames: reading, filtering, adding columns · Python for Data Science

Tasks

Question 1

stdin: line 1 = space-separated scores (floats); line 2 = space-separated labels (0/1); line 3 = threshold t (float). Using NumPy boolean masks (no Python loops over elements), print precision and recall rounded to 2 decimals, space-separated. Predict positive when score >= t. If a denominator is 0, print 0.0 for that metric.

Example input:

0.9 0.6 0.2 0.1
1 0 1 0
0.5

Expected output:

0.5 0.5

Runtime output (stdout/stderr)

3 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Question 2

Why must you use & / | / ~ (not and / or / not) when combining NumPy boolean arrays?

Python keywords are slower

and/or operate on the whole array's truth value and raise an ambiguity error; &/| work elementwise

There is no difference

& only works on integers

Question 3

What does np.sum(scores >= 0.5) compute?

The sum of all scores

The number of elements that are at least 0.5

The largest score

0.5 times the length

Question 4

For boolean arrays pred and actual, which expressions correctly count true positives (predicted True and actually True)? Select all that apply.

Select all that apply.

np.sum(pred & actual)

(pred & actual).sum()

np.sum(pred and actual)

np.count_nonzero(pred & actual)

Card Info

Topic: Python for Data Science
Difficulty: Intermediate
Completed: 0 users

Creator

Best

BestBuddy