Counting and accumulating per key

Intermediate Python for Data Science

Created by Best · 24.06.2026 at 14:03 UTC

Most analytics questions have the same shape: split the rows into groups by some key, then reduce each group to a number. "How many events per user?" "Average score per category?" Learn to say that in plain Python and every later groupby is just the same idea made fast.

Start with counting. Counter from the collections module tallies how often each key appears:

from collections import Counter

regions = ["west", "east", "west", "west", "east"]
counts = Counter(regions)     # Counter({'west': 3, 'east': 2})

When you need to accumulate per key rather than just count — collect each group's values, or keep a running total — the tool is defaultdict. A plain dict raises KeyError the first time you touch a key that isn't there; a defaultdict creates a starting value automatically:

from collections import defaultdict

records = [{"user": "u1", "score": 5},
           {"user": "u1", "score": 2},
           {"user": "u2", "score": 9}]

totals = defaultdict(int)          # a missing key starts at 0
for r in records:
    totals[r["user"]] += r["score"]
# totals == {'u1': 7, 'u2': 9}

The argument is a factory — a function defaultdict calls to make a fresh starting value. defaultdict(int) starts each new key at 0; defaultdict(list) starts each at a new empty list.
leads into “Ranking, ties, and toward groupby”.*

University approvals: 0

Related cards

Builds on Project paths and feature pipelines · Python for Data Science

Next Ranking, ties, and toward groupby · Python for Data Science

Tasks

Question 1

What does collections.Counter do?

Sorts a list in ascending order

Counts how many times each distinct value appears

Removes duplicate values

Reverses a dictionary

Question 2

Why use defaultdict(list) instead of a plain dict when accumulating values per key?

It sorts the keys automatically

It creates a fresh empty list the first time each key is seen, avoiding KeyError

It is faster than a normal dict for lookups

It removes duplicate values

Card Info

Topic: Python for Data Science
Difficulty: Intermediate
Completed: 0 users

Creator

Best

BestBuddy