Creating, slicing, and selecting arrays
Everything you did with lists and nested loops now collapses into one fast, typed object: the NumPy array, or ndarray. NumPy is the foundation the whole data-science stack stands on. You create an array and immediately ask it about itself:
import numpy as np
a = np.array([[1, 2, 3],
[4, 5, 6]], dtype=np.float64)
print(a.shape) # (2, 3) -> 2 rows, 3 columns
print(a.dtype) # float64
Two properties define an array. Its shape is a tuple of its dimensions. Its dtype is the single type shared by every element; unlike a Python list, an array is homogeneous, which is what lets NumPy store the numbers in one tight block of memory and operate on them in compiled C.
Selecting data is done by slicing, and the row/column thinking from before pays off directly:
a[0] # first row -> array([1., 2., 3.])
a[:, 1] # second column -> array([2., 5.])
a[1, 2] # row 1, column 2 -> 6.0
a[a > 3] # every element greater than 3 -> array([4., 5., 6.])
That last line is a first taste of a boolean mask — selecting by a condition rather than a position — which the next topic builds into a power tool. The colon : means "everything along this axis", so a[:, 1] is "every row, column 1".
into “Reductions, statistics, and views vs copies”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users