Damage Regression II: Combinatorial Feature Expansion and Baseline Prediction

Advanced Damage Regression

Created by Pavel · 12.03.2026 at 07:54 UTC · 3 completed

After extraction, each report is represented by up to 3 mapped token IDs. The lecture case expands one report into several training rows using permutations of these token IDs.

Pipeline stage:
- select up to 3 mapped tokens,
- generate itertools.permutations(nouns) to augment order variants,
- append each permutation to datainput,
- append mapped category code to dataoutput.

Prediction stage:
- train LinearRegression on encoded input/output,
- predict numeric output for new token triplets,
- round prediction and map back to category code.

Important caveats:
- permutation expansion can over-represent one report,
- linear regression on category IDs assumes numeric distance,
- rounding can produce invalid/out-of-range IDs,
- retraining inside each prediction call is expensive.

This baseline is useful for teaching pipeline assembly, but production versions should compare with true classification models and proper validation.

University approvals: 0

scikit-learn.org/stable/modules/linear_model.html

article

scikit-learn.org

Tasks

Question 1

If a report contributes 3 distinct token IDs and we use full permutations, how many training rows are generated for that single report before label append?

Hint

Count permutations of length 3.

Question 2

Code task: implement permutation expansion.

import itertools

def expand_triplet(nouns):
    """Return all order variants for up to 3 noun IDs."""
    # TODO

Submission format: submit a full function definition def expand_triplet(...): ....

Hint

Call itertools.permutations on the incoming tuple/list.

import itertools


def expand_triplet(nouns):
    # TODO: all permutations of the id tuple/list
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Question 3

Which risk is specific to mapping rounded regression output back to class IDs with revcodemap[round(pred)]?

Hint

Consider out-of-range rounded predictions.

Rounded value may not exist as a valid category key.

Regression no longer returns floats.

Model cannot be trained with 3 features.

Permutation generation stops working.

Question 4

Code task: implement safe prediction decoding.

def decode_prediction(pred_value, revcodemap):
    """Round prediction; return mapped code or None if missing."""
    # TODO

Submission format: submit a full function definition def decode_prediction(...): ....

Hint

Use round(pred_value) then dict.get.

def decode_prediction(pred_value, revcodemap):
    # TODO: round; dict.get when key missing -> None
    pass

Starter code is prefilled; replace TODO blocks with your solution.

Runtime output (stdout/stderr)

2 test cases will be used for grading

Run checks runtime behavior only. Final correctness is evaluated when you submit.

Card Info

Topic: Damage Regression
Difficulty: Advanced
Completed: 3 users

Creator

Pavel