Pydantic for boundary validation
When a JSON payload, a CSV row, or an API response hits your system, you have no guarantee about types, ranges, or format. That is where Pydantic lives: define a BaseModel with typed fields and constraints, call Model.model_validate(payload_dict), and Pydantic either returns a validated, coerced object or raises a ValidationError listing every problem at once — not just the first one.
Field(gt=0) means greater-than-zero; Field(min_length=3) enforces minimum string length; EmailStr validates email format with a proper parser. These declarative constraints cover syntactic validation — correct type and format. For semantic validation (business rules like 'storage tier must be bronze, silver, or gold'), you write a @field_validator — a classmethod that receives the already-coerced value, sanitizes it if needed, and either returns the clean value or raises ValueError.
Contextual validation — checking that a dataset ID is unique in the database, or that a referenced model exists — requires external state and belongs in a different layer, not inside the Pydantic model.
When validation fails, ValidationError collects structured error objects: each tells you which field, what rule, and what was wrong. This is far better than a single exception-on-first-failure because the caller (API client, pipeline operator) can fix all problems in one round trip.
After boundary validation, convert the Pydantic model to an internal dataclass (to_domain(validated)) and pass that downstream. This keeps Pydantic at the edge and your domain logic framework-agnostic.
Pydantic docs: [1].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Intermediate
- Completed: 0 users