Broadcasting and boolean masks
Broadcasting is NumPy's rule for combining arrays of different shapes. When the shapes are compatible, NumPy "stretches" the smaller one to match the larger, without copying. The everyday use is centring a table by subtracting a per-column mean:
import numpy as np
X = np.array([[1., 2.],
[3., 4.],
[5., 6.]])
col_mean = X.mean(axis=0) # shape (2,) -> one mean per column
centered = X - col_mean # (3,2) minus (2,) broadcasts down every row
The (2,) vector of column means is applied to each of the three rows automatically. Once you see it, you stop writing the double loop you'd have written a couple of topics ago.
A boolean mask is how you express a filter. Compare an array to something and you get back an array of True/False of the same shape, which you can then count, select with, or assign through:
scores = np.array([0.9, 0.6, 0.2, 0.1])
pred = scores >= 0.5 # array([ True, True, False, False])
scores[pred] # array([0.9, 0.6]) -- the selected values
This single idea — turn a question into a True/False array — is the core analyst pattern, and it's also exactly how prediction metrics are computed later in this course (precision, recall, and the rest).
and leads into “Counting matches with masks; &, |, ~ and np.where”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users