Parsing rows and the bool('False') trap
Now combine the two ideas: a function that turns one raw row of strings into properly typed values.
def parse_employee_row(row):
return {
"id": int(row["id"]),
"dept": row["dept"],
"salary": float(row["salary"]),
"full_time": row["full_time"] == "1",
}
Each line converts a string to its proper type: int(...) for the id, float(...) for the salary, and a comparison row["full_time"] == "1" that yields a real True or False. You then call it on every row, after which the rest of your program can trust the data is clean:
clean = [parse_employee_row(r) for r in rows]
print(clean[0]["salary"] + 1000) # arithmetic works now
One booby-trap deserves a flashing light: never write bool("False") to parse a boolean. In Python any non-empty string is "truthy", so bool("False") is True — the exact opposite of what you meant. Always compare explicitly, as in row["flag"] == "1" or == "True".
Be deliberate about missing values too: an empty string is not the same as None; decide which you mean and convert accordingly. These boundary habits matter because almost every data bug is a type bug that slipped in early — a number left as text, a 0 standing in for "missing" that quietly lowers a mean.
and leads into “Lists as vectors: scaling and the dot product”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Beginner
- Completed: 0 users