Project paths and feature pipelines

Intermediate Python for Data Science
Created by Best · 24.06.2026 at 14:03 UTC

The same instinct produces small utilities that keep a project tidy. A recurring nuisance is file paths: a notebook run from one folder can't find data/... when run from another. A repo_root() helper walks upward until it finds the project, so paths resolve no matter where the code runs:

from pathlib import Path

def repo_root() -> Path:
    here = Path.cwd().resolve()
    for p in [here, *here.parents]:
        if (p / "data").is_dir():
            return p
    raise FileNotFoundError("Run from inside the project.")

The payoff of all this naming is that feature engineering — often the work that most decides whether a model succeeds — becomes a sequence of named, testable steps (standardize, clip_outliers, encode_flag) instead of a wall of inline arithmetic copied between notebooks. Each function can be checked in isolation, and each name tells the reader what happens without re-reading the body.

The honest caveat: these hand-written helpers are for learning. NumPy, pandas and scikit-learn provide faster, more robust versions, and later topics swap your implementations for theirs. But from here on the course assumes you reach for a small typed function by default.
into “Counting and accumulating per key”.*

University approvals: 0
Related cards
Builds on Type hints and a numeric helper · Python for Data Science
Next Counting and accumulating per key · Python for Data Science
Tasks
Question 1

Why write a repo_root() helper instead of hard-coding a path like "data/file.csv"?

Question 2

What is the main benefit of splitting feature engineering into small, named functions?

Card Info
  • Topic: Python for Data Science
  • Difficulty: Intermediate
  • Completed: 0 users
Creator
Best
Best
BestBuddy