Ranking, ties, and toward groupby
You usually want the result ranked. Sorting by a tuple key gives multi-level ordering, and a leading minus flips a field to descending. "Most events first, ties broken by name alphabetically" is exactly:
ranked = sorted(totals.items(), key=lambda kv: (-kv[1], kv[0]))
Here kv is a (key, value) pair; -kv[1] sorts by value descending and kv[0] breaks ties by key ascending. Getting the tie-break into the sort key matters — without it, equal-valued groups come out in an unpredictable order.
Doing this by hand teaches you what aggregation actually is, so that when pandas hands you a one-line groupby you can still reason about what quietly goes wrong: which denominator you're dividing by, how ties are resolved, where the missing values went.
One classic mistake to avoid: the factory passed to defaultdict must be the function, not an instance. defaultdict(list) is correct; defaultdict([]) and defaultdict({}) are not. The cost of the manual approach is that it gets verbose the moment you need several keys and several summaries at once — which is exactly the itch that df.groupby(key).agg(...) scratches next.
into “Creating, slicing, and selecting arrays”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users