Python dataclasses for trusted domain objects
After external data passes boundary validation, it enters your system as a trusted domain object — a Python object representing a business concept. The @dataclass decorator turns a class with annotated fields into one that has __init__, __repr__, and __eq__ generated for free, so you write the fields and the constructor appears.
A critical subtlety that trips students every semester: type hints in dataclasses are not enforced at runtime. Write dataset_id: int and then call DatasetMeta(dataset_id='oops') — Python will happily construct the object with a string in an int field, because hints are documentation for type checkers (mypy, pyright), not runtime guards. If you need runtime enforcement, you add it yourself in __post_init__, a special method that Python calls after the generated __init__ finishes.
__post_init__ is where you encode invariants — conditions that must always hold for a valid object. A positive dataset ID, a non-empty name, a non-negative row count. Raising ValueError there prevents invalid state from ever being constructed, which is vastly safer than checking validity at every use site downstream.
One more Python trap: never write tags: list = [] as a default. That empty list is created once at class definition time and shared across all instances — append to one object's tags and every object sees it. The fix is tags: list = field(default_factory=list), which calls list() fresh for each new object.
Use dataclasses when data has already been validated at the boundary and you need a clear, lightweight domain vocabulary. For untrusted input, reach for Pydantic instead.
Dataclasses docs: [1].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Beginner
- Completed: 0 users