Broadcasting and boolean masks

Broadcasting is NumPy's rule for combining arrays of different shapes. When the shapes are compatible, NumPy "stretches" the smaller one to match the larger, without copying. The everyday use is centring a table by subtracting a per-column mean:

import numpy as np
X = np.array([[1., 2.],
              [3., 4.],
              [5., 6.]])
col_mean = X.mean(axis=0)     # shape (2,) -> one mean per column
centered = X - col_mean       # (3,2) minus (2,) broadcasts down every row

The (2,) vector of column means is applied to each of the three rows automatically. Once you see it, you stop writing the double loop you'd have written a couple of topics ago.

A boolean mask is how you express a filter. Compare an array to something and you get back an array of True/False of the same shape, which you can then count, select with, or assign through:

scores = np.array([0.9, 0.6, 0.2, 0.1])
pred = scores >= 0.5          # array([ True,  True, False, False])
scores[pred]                  # array([0.9, 0.6]) -- the selected values

This single idea — turn a question into a True/False array — is the core analyst pattern, and it's also exactly how prediction metrics are computed later in this course (precision, recall, and the rest).
and leads into “Counting matches with masks; &, |, ~ and np.where”.*

Broadcasting and boolean masks

Related cards

Tasks

Question 1

Question 2

Question 3

Card Info

Creator