Feature-Set Experiments with OOP

Intermediate Data Science Engineering
Created by Pavel · 12.03.2026 at 07:54 UTC · 2 completed

Problem setup:
A Data Science team has many candidate features but can train only a subset per run due to compute limits.

Object model:
- FeaturePool: all available features,
- ExperimentRun: selected feature subset, random seed, and score,
- optional tracker class for comparing runs.

Core operation:
sample_features(k) draws k distinct features from the pool.

Use cases:
- ablation studies,
- quick baseline screening,
- reproducible random search over feature subsets.

Edge cases:
- k larger than available features,
- duplicate feature names,
- non-reproducible runs when seed is not recorded.

University approvals: 0
Tasks
Question 1

Which metadata is essential to reproduce one feature-subset experiment run?

Hint

You need enough information to rerun exactly.

Question 2

Code task: implement FeaturePool.sample_features.

import random

class FeaturePool:
    def __init__(self, features):
        self.features = list(features)

    def sample_features(self, k):
        # TODO
        pass

Submission format: submit the full class snippet shown above with # TODO/pass replaced.

Hint

Guard invalid k, then call random.sample.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Question 3

For selecting unique subsets where order does not matter, which iterator is semantically correct?

Hint

Unordered unique k-subsets.

Question 4

Code task: implement count_subsets(n_features, k) using itertools.combinations.

Submission format: submit a full function definition def count_subsets(...): ....

Hint

Count tuples from combinations(range(n_features), k).

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Data Science Engineering
  • Difficulty: Intermediate
  • Completed: 2 users
Creator
Pavel
Pavel