Python dataclasses for trusted domain objects

Beginner Data Science Praktikum
Created by Pavel · 03.04.2026 at 12:12 UTC

After external data passes boundary validation, it enters your system as a trusted domain object — a Python object representing a business concept. The @dataclass decorator turns a class with annotated fields into one that has __init__, __repr__, and __eq__ generated for free, so you write the fields and the constructor appears.

A critical subtlety that trips students every semester: type hints in dataclasses are not enforced at runtime. Write dataset_id: int and then call DatasetMeta(dataset_id='oops') — Python will happily construct the object with a string in an int field, because hints are documentation for type checkers (mypy, pyright), not runtime guards. If you need runtime enforcement, you add it yourself in __post_init__, a special method that Python calls after the generated __init__ finishes.

__post_init__ is where you encode invariants — conditions that must always hold for a valid object. A positive dataset ID, a non-empty name, a non-negative row count. Raising ValueError there prevents invalid state from ever being constructed, which is vastly safer than checking validity at every use site downstream.

One more Python trap: never write tags: list = [] as a default. That empty list is created once at class definition time and shared across all instances — append to one object's tags and every object sees it. The fix is tags: list = field(default_factory=list), which calls list() fresh for each new object.

Use dataclasses when data has already been validated at the boundary and you need a clear, lightweight domain vocabulary. For untrusted input, reach for Pydantic instead.

Dataclasses docs: [1].


Sources

University approvals: 0
Tasks
Question 1

What does this code print?

from dataclasses import dataclass

@dataclass
class Meta:
    dataset_id: int
    name: str

obj = Meta(dataset_id='oops', name=123)
print(type(obj.dataset_id).__name__, type(obj.name).__name__)
Hint

Dataclass type hints are NOT enforced at runtime.

Question 2

What does __post_init__ do in a @dataclass?

Hint

Think about when invariant checks should fire.

Question 3

Implement a @dataclass called CheckedDataset with fields dataset_id: int, name: str, row_count: int = 0. In __post_init__, validate that: (1) dataset_id is a positive integer, (2) name is non-empty after stripping whitespace (also strip it in place), (3) row_count is non-negative. Raise ValueError for any violation.

Then implement safe_create(did, name, rc) -> tuple that returns (True, obj.name) on success or (False, '') on ValueError.

Submit both; tests call safe_create.

Hint

isinstance checks type; strip() cleans name; raise ValueError for violations.

Starter code is prefilled; replace TODO blocks with your solution.
3 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Data Science Praktikum
  • Difficulty: Beginner
  • Completed: 0 users
Creator
Pavel
Pavel