Feature Subset Optimization II: Heuristics, Baselines, and Validation Discipline

Advanced Combinatorial Search in DS
Created by Pavel · 12.03.2026 at 07:54 UTC · 1 completed

When exhaustive search is too expensive, a common practical approach is greedy forward selection.

Forward-selection loop:
1. start with empty set,
2. evaluate adding each remaining feature,
3. add the feature with best score gain,
4. stop by criterion (max features, no improvement, budget).

Why baseline comparison matters:
- compare greedy search against random subset sampling,
- otherwise observed gains may be due to chance.

Validation discipline:
- use fixed split protocol or cross-validation,
- keep selection and evaluation separated to avoid leakage,
- test stability across random seeds.

Edge cases:
- local optimum traps,
- noisy score function,
- ties between candidate features.

University approvals: 0
Tasks
Question 1

Why should heuristic feature search be compared with a random baseline?

Hint

Baselines calibrate claim strength.

Question 2

Code task: implement one forward-selection step.

def add_best_feature(current_set, candidates, score_fn):
    """Return set after adding single candidate with highest score."""
    # TODO

score_fn(feature_set) returns numeric score (higher is better).

Submission format: submit the full function snippet shown above with # TODO replaced.

Hint

Do not mutate current_set while scoring proposals.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Question 3

Which practice best reduces selection bias while evaluating chosen feature subsets?

Hint

Prevent information leakage from evaluation set.

Question 4

Code task: implement improvement-based stop criterion.

def should_stop(last_best_score, new_best_score, min_gain=0.0):
    """Return True when improvement is smaller than min_gain."""
    # TODO

Submission format: submit a full function definition def should_stop(...): ....

Hint

Gain is new_best_score - last_best_score.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Combinatorial Search in DS
  • Difficulty: Advanced
  • Completed: 1 users
Creator
Pavel
Pavel