Feature-Set Experiments with OOP

Intermediate Data Science Engineering

Created by Pavel · 12.03.2026 at 07:54 UTC · 2 completed

Problem setup:
A Data Science team has many candidate features but can train only a subset per run due to compute limits.

Object model:
- FeaturePool: all available features,
- ExperimentRun: selected feature subset, random seed, and score,
- optional tracker class for comparing runs.

Core operation:
sample_features(k) draws k distinct features from the pool.

Use cases:
- ablation studies,
- quick baseline screening,
- reproducible random search over feature subsets.

Edge cases:
- k larger than available features,
- duplicate feature names,
- non-reproducible runs when seed is not recorded.

University approvals: 0

scikit-learn.org/stable/modules/feature_selection.html

article

scikit-learn.org

Tasks

Question 1

Which metadata is essential to reproduce one feature-subset experiment run?

Hint

You need enough information to rerun exactly.

Only the final score.

Selected features and random seed.

Only model name.

Only training timestamp.

Question 2

Code task: implement FeaturePool.sample_features.

import random

class FeaturePool:
    def __init__(self, features):
        self.features = list(features)

    def sample_features(self, k):
        # TODO
        pass

Submission format: submit the full class snippet shown above with # TODO/pass replaced.

Hint

Guard invalid k, then call random.sample.

import random


class FeaturePool:
    def __init__(self, features):
        self.features = list(features)

    def sample_features(self, k):
        # TODO: k unique items; ValueError if invalid k
        pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Question 3

For selecting unique subsets where order does not matter, which iterator is semantically correct?

Hint

Unordered unique k-subsets.

itertools.permutations

itertools.combinations

itertools.product

zip

Question 4

Code task: implement count_subsets(n_features, k) using itertools.combinations.

Submission format: submit a full function definition def count_subsets(...): ....

Hint

Count tuples from combinations(range(n_features), k).

from itertools import combinations


def count_subsets(n_features, k):
    # TODO: count k-subsets of range(n_features)
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Data Science Engineering
Difficulty: Intermediate
Completed: 2 users

Creator

Pavel