Damage Regression II: Combinatorial Feature Expansion and Baseline Prediction

Advanced Damage Regression
Created by Pavel · 12.03.2026 at 07:54 UTC · 3 completed

After extraction, each report is represented by up to 3 mapped token IDs. The lecture case expands one report into several training rows using permutations of these token IDs.

Pipeline stage:
- select up to 3 mapped tokens,
- generate itertools.permutations(nouns) to augment order variants,
- append each permutation to datainput,
- append mapped category code to dataoutput.

Prediction stage:
- train LinearRegression on encoded input/output,
- predict numeric output for new token triplets,
- round prediction and map back to category code.

Important caveats:
- permutation expansion can over-represent one report,
- linear regression on category IDs assumes numeric distance,
- rounding can produce invalid/out-of-range IDs,
- retraining inside each prediction call is expensive.

This baseline is useful for teaching pipeline assembly, but production versions should compare with true classification models and proper validation.

University approvals: 0
Tasks
Question 1

If a report contributes 3 distinct token IDs and we use full permutations, how many training rows are generated for that single report before label append?

Hint

Count permutations of length 3.

Question 2

Code task: implement permutation expansion.

import itertools

def expand_triplet(nouns):
    """Return all order variants for up to 3 noun IDs."""
    # TODO

Submission format: submit a full function definition def expand_triplet(...): ....

Hint

Call itertools.permutations on the incoming tuple/list.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Question 3

Which risk is specific to mapping rounded regression output back to class IDs with revcodemap[round(pred)]?

Hint

Consider out-of-range rounded predictions.

Question 4

Code task: implement safe prediction decoding.

def decode_prediction(pred_value, revcodemap):
    """Round prediction; return mapped code or None if missing."""
    # TODO

Submission format: submit a full function definition def decode_prediction(...): ....

Hint

Use round(pred_value) then dict.get.

Starter code is prefilled; replace TODO blocks with your solution.
2 test cases will be used for grading
Run checks runtime behavior only. Final correctness is evaluated when you submit.
Card Info
  • Topic: Damage Regression
  • Difficulty: Advanced
  • Completed: 3 users
Creator
Pavel
Pavel