Lists as vectors: scaling and the dot product
A column of numbers — prices, scores, temperatures — is a vector, and in plain Python a vector is just a list. The most basic operation is applying something to every element. To raise every price by 10%:
prices = [10.0, 20.0, 30.0]
scaled = []
for p in prices:
scaled.append(p * 1.1)
print(scaled) # [11.0, 22.0, 33.0]
The pattern is: empty result list, loop, compute one value, append. Python has a tidier shorthand for exactly this, the list comprehension, which reads almost like English ("p times 1.1, for each p in prices"):
scaled = [p * 1.1 for p in prices]
The single most important vector operation in data science is the dot product: multiply two vectors element by element, then add up the results. It underlies weighted averages, similarity, and later matrix multiplication and linear models. To walk two lists in step, use zip, which hands you one pair at a time:
xs = [1.0, 2.0, 3.0]
ys = [4.0, 5.0, 6.0]
dot = 0.0
for x, y in zip(xs, ys):
dot += x * y
print(dot) # 1*4 + 2*5 + 3*6 = 32.0
Here dot is an accumulator: start it at 0.0 and add into it as you go. A special case you'll use constantly is a vector's dot product with itself — the sum of squares — which measures size and appears in variance, distances, and error terms.
leads into “Why loops first, and the length pitfall”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Beginner
- Completed: 0 users