Measurement scales and your first function
Everything read from a CSV is a string, so before you can compute you must parse each value into the right type — and the choice is not bookkeeping, it's a statement about what the data means. A column's type encodes its measurement scale:
- An identifier like
employee_idis a nominal label. It may be written with digits, but averaging ids is meaningless. Keep it asstrorint, never do arithmetic on it. - A category / code like
department_codeis also a label, not a quantity. - A measurement like
salaryis a real number you can add and average — afloat. - A flag / indicator like
is_fulltimeis a yes/no — abool. - A genuinely missing value is
None.
| Column role | Python type | Average it? |
|---|---|---|
| identifier / code | str or int |
no — it's a label |
| measurement | float |
yes |
| flag / indicator | bool |
as a 0/1 rate |
| label / category | str |
no |
| missing | None |
n/a |
To parse a row once and reuse it, you need a function. def defines a named command: you teach Python a small job, give it a name, and call it whenever you need it. A function takes inputs (its parameters), does its work, and hands back a result with return:
def double(x):
return x * 2
print(double(21)) # 42
The value after return is handed back to whoever called the function, where it can be stored or printed. Naming a job this way is what lets you apply the same logic to thousands of rows without repeating yourself.
leads into “Parsing rows and the bool('False') trap”.*
Related cards
Tasks
Card Info
- Topic: Python for Data Science
- Difficulty: Beginner
- Completed: 0 users