Measurement scales and your first function

Everything read from a CSV is a string, so before you can compute you must parse each value into the right type — and the choice is not bookkeeping, it's a statement about what the data means. A column's type encodes its measurement scale:

An identifier like employee_id is a nominal label. It may be written with digits, but averaging ids is meaningless. Keep it as str or int, never do arithmetic on it.
A category / code like department_code is also a label, not a quantity.
A measurement like salary is a real number you can add and average — a float.
A flag / indicator like is_fulltime is a yes/no — a bool.
A genuinely missing value is None.

Column role	Python type	Average it?
identifier / code	`str` or `int`	no — it's a label
measurement	`float`	yes
flag / indicator	`bool`	as a 0/1 rate
label / category	`str`	no
missing	`None`	n/a

To parse a row once and reuse it, you need a function. def defines a named command: you teach Python a small job, give it a name, and call it whenever you need it. A function takes inputs (its parameters), does its work, and hands back a result with return:

def double(x):
    return x * 2

print(double(21))    # 42

The value after return is handed back to whoever called the function, where it can be stored or printed. Naming a job this way is what lets you apply the same logic to thousands of rows without repeating yourself.
leads into “Parsing rows and the bool('False') trap”.*

Measurement scales and your first function

Related cards

Tasks

Question 1

Question 2

Question 3

Card Info

Creator