Counting and accumulating per key

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

Most analytics questions have the same shape: split the rows into groups by some key, then reduce each group to a number. "How many events per user?" "Average score per category?" Learn to say that in plain Python and every later groupby is just the same idea made fast.

Start with counting. Counter from the collections module tallies how often each key appears:

from collections import Counter

regions = ["west", "east", "west", "west", "east"]
counts = Counter(regions)     # Counter({'west': 3, 'east': 2})

When you need to accumulate per key rather than just count — collect each group's values, or keep a running total — the tool is defaultdict. A plain dict raises KeyError the first time you touch a key that isn't there; a defaultdict creates a starting value automatically:

from collections import defaultdict

records = [{"user": "u1", "score": 5},
           {"user": "u1", "score": 2},
           {"user": "u2", "score": 9}]

totals = defaultdict(int)          # a missing key starts at 0
for r in records:
    totals[r["user"]] += r["score"]
# totals == {'u1': 7, 'u2': 9}

The argument is a factory — a function defaultdict calls to make a fresh starting value. defaultdict(int) starts each new key at 0; defaultdict(list) starts each at a new empty list.
leads into “Ranking, ties, and toward groupby”.*

University approvals: 0
Related cards
Builds on Project paths and feature pipelines · Python for Data Science
Next Ranking, ties, and toward groupby · Python for Data Science
Tasks
Question 1

What does collections.Counter do?

Question 2

Why use defaultdict(list) instead of a plain dict when accumulating values per key?

Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy