Reductions, statistics, and views vs copies

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

The real reason to switch is vectorisation: you write whole-array expressions and NumPy does the looping, fast. To scale every element, just a * 2. For a per-column or per-row total, reduce along an axis:

a.sum(axis=0)    # column sums: collapse the rows -> array([5., 7., 9.])
a.sum(axis=1)    # row sums:    collapse the cols -> array([6., 15.])

The rule worth memorising is "axis=0 reduces down the rows, leaving one value per column." Mixing up the axes is the most common NumPy slip.

Arrays carry their statistics as methods that run in compiled code:

a = np.array([2, 4, 4, 4, 5, 5, 7, 9], dtype=float)
a.mean()    # 5.0
a.std()     # 2.0  (population standard deviation, the NumPy default)

One detail to remember: .std() defaults to the population standard deviation (dividing by n, i.e. ddof=0); pass ddof=1 if you want the sample version. These one-line reductions replace the explicit loops you wrote earlier and are far faster on large arrays.

Two subtleties to file away. First, slicing returns a view — a window onto the same memory, not a copy — so modifying a slice modifies the original array. That makes NumPy fast but can surprise you; use .copy() when you need an independent array.

Second, an array is homogeneous, so if you accidentally mix numbers and strings, NumPy falls back to dtype=object and you lose all the speed. Keep a numeric array numeric. The limitation behind both points is that arrays have no column names and can't join tables — which is exactly pandas' job, coming next.
leads into “Broadcasting and boolean masks”.*

University approvals: 0
Related cards
Builds on Creating, slicing, and selecting arrays · Python for Data Science
Next Broadcasting and boolean masks · Python for Data Science
Tasks
Question 1

For a 2-D NumPy array a, what does a.sum(axis=0) compute?

Question 2

Read a line of space-separated numbers from stdin. Using NumPy, print the array's mean and standard deviation (population, the NumPy default) separated by a space, each rounded to 2 decimals.

Example input:

2 4 4 4 5 5 7 9

Expected output:

5.0 2.0
3 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Question 3

Slicing a NumPy array (for example a[0] or a[:, 1]) returns...

Question 4

Select the true statements about NumPy reductions and views.

Select all that apply.
Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy