Reading JSON and the many faces of missing
Most web data arrives as JSON, a text format of nested objects and arrays that maps cleanly onto Python's dicts and lists:
import json
with open("data/weather.json", encoding="utf-8") as f:
doc = json.load(f) # a tree of dicts and lists
temp = doc.get("temp_c") # None if the key is absent
Notice .get(key) rather than doc["key"]: .get returns None for a missing key instead of crashing, which is the safer way to read data you didn't create. JSON's null becomes Python's None, an array becomes a list, an object becomes a dict — so reading nested JSON is really just walking dicts and lists.
The subtle part of real data is that the token for "missing" changes as a value moves through your stack, and each form has its own trap:
| Where you are | "Missing" looks like | The trap |
|---|---|---|
| plain Python | None |
None + 1 raises a TypeError |
| a NumPy float array | np.nan |
nan != nan, and it poisons mean -- use np.nanmean |
| pandas | NaN / pd.NA |
value_counts() drops it silently |
| JSON | null -> None |
an absent key and an explicit null differ |
The headline trap: a single NaN turns an ordinary mean into nan. Use np.nanmean (or pandas' skip-na behaviour) when you mean "average the values that are present".
leads into “Handling missing data and validating input”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Intermediate
- Completed: 0 users