Pydantic for boundary validation

Intermediate Data Science Praktikum

Created by Pavel · 03.04.2026 at 12:12 UTC

When a JSON payload, a CSV row, or an API response hits your system, you have no guarantee about types, ranges, or format. That is where Pydantic lives: define a BaseModel with typed fields and constraints, call Model.model_validate(payload_dict), and Pydantic either returns a validated, coerced object or raises a ValidationError listing every problem at once — not just the first one.

Field(gt=0) means greater-than-zero; Field(min_length=3) enforces minimum string length; EmailStr validates email format with a proper parser. These declarative constraints cover syntactic validation — correct type and format. For semantic validation (business rules like 'storage tier must be bronze, silver, or gold'), you write a @field_validator — a classmethod that receives the already-coerced value, sanitizes it if needed, and either returns the clean value or raises ValueError.

Contextual validation — checking that a dataset ID is unique in the database, or that a referenced model exists — requires external state and belongs in a different layer, not inside the Pydantic model.

When validation fails, ValidationError collects structured error objects: each tells you which field, what rule, and what was wrong. This is far better than a single exception-on-first-failure because the caller (API client, pipeline operator) can fix all problems in one round trip.

After boundary validation, convert the Pydantic model to an internal dataclass (to_domain(validated)) and pass that downstream. This keeps Pydantic at the edge and your domain logic framework-agnostic.

Pydantic docs: [1].

Sources

[1]https://docs.pydantic.dev/latest/Return to text

University approvals: 0

Tasks

Question 1

How many validation errors does this produce?

from pydantic import BaseModel, Field, ValidationError

class DatasetIn(BaseModel):
    dataset_id: int = Field(gt=0)
    name: str = Field(min_length=3)
    row_count: int = Field(ge=1)

try:
    DatasetIn.model_validate({'dataset_id': -5, 'name': 'x', 'row_count': 0})
except ValidationError as e:
    print(e.error_count())

Hint

dataset_id=-5 violates gt=0; name='x' violates min_length=3; row_count=0 violates ge=1.

1 — it stops at the first error

3 — all three fields fail their constraints

0 — the payload is valid

2 — name passes because 'x' is a string

Question 2

In the validation taxonomy, checking that storage_tier is one of {'bronze', 'silver', 'gold'} is what kind of validation?

Hint

The allowed values are defined by business rules, not by string format or external state.

Syntactic — it only checks the string format

Semantic — it checks business meaning (valid tier values)

Contextual — it requires a database lookup

Structural — it checks JSON nesting depth

Question 3

Implement validate_payload(raw: dict) -> dict that mimics Pydantic boundary validation without importing it. The payload must have:
- 'dataset_id': positive int
- 'name': string, strip + lowercase + replace spaces with _, result length ≥ 3
- 'tier': string, normalize to lowercase, must be in {'bronze','silver','gold'}

Collect ALL errors as strings in a list. If errors exist, raise ValueError with the joined list. Otherwise return the cleaned dict.

Submit the function; tests use expression mode.

Hint

Collect errors in a list instead of raising immediately. Only raise at the end if the list is non-empty.

def validate_payload(raw: dict) -> dict:
    # TODO: validate all fields, collect errors, raise or return clean dict.
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Praktikum
Difficulty: Intermediate
Completed: 0 users

Creator

Pavel