ORM: bridging objects and SQL with SQLAlchemy
Software objects and relational tables represent data in fundamentally different ways. In Python, a Dataset object has identity in memory, references other objects directly (dataset.runs), and vanishes when the process ends. In SQL, the same data lives in rows with primary keys, and the reference between a dataset and its experiment runs is a foreign key column — not a Python pointer. This tension is called the object-relational impedance mismatch, and ORM exists to bridge it.
SQLAlchemy is the standard Python ORM. You define model classes that inherit from DeclarativeBase, map fields to columns with mapped_column, and express relationships with relationship(). Then you work with Python objects while SQLAlchemy writes the SQL. session.add(obj) stages an insert; session.commit() writes all staged changes to the database in one transaction — the unit of work pattern.
Two internals worth knowing: the identity map guarantees that loading the same database row twice in one session returns the same Python object (not a copy). And lazy loading of relationships — convenient but dangerous — leads to the N+1 query problem: iterate 100 datasets and access d.runs for each, and the ORM fires 100 extra queries instead of one join. Fix it with explicit join queries or eager loading.
In a well-layered system, the ORM sits at the persistence layer, separate from domain logic. The canonical flow is: external data → Pydantic (validate) → dataclass (domain logic) → ORM (persist to SQL). Putting ORM models at the API boundary leaks database schema to callers; putting Pydantic deep inside business logic ties domain code to a validation framework. Keeping each tool in its layer is what makes the system maintainable.
SQLAlchemy ORM docs: [1], session basics: [2].
Sources
Tasks
Card Info
- Topic: Data Science Praktikum
- Difficulty: Intermediate
- Completed: 0 users