Damage Regression II: Combinatorial Feature Expansion and Baseline Prediction
After extraction, each report is represented by up to 3 mapped token IDs. The lecture case expands one report into several training rows using permutations of these token IDs.
Pipeline stage:
- select up to 3 mapped tokens,
- generate itertools.permutations(nouns) to augment order variants,
- append each permutation to datainput,
- append mapped category code to dataoutput.
Prediction stage:
- train LinearRegression on encoded input/output,
- predict numeric output for new token triplets,
- round prediction and map back to category code.
Important caveats:
- permutation expansion can over-represent one report,
- linear regression on category IDs assumes numeric distance,
- rounding can produce invalid/out-of-range IDs,
- retraining inside each prediction call is expensive.
This baseline is useful for teaching pipeline assembly, but production versions should compare with true classification models and proper validation.
Tasks
Card Info
- Topic: Damage Regression
- Difficulty: Advanced
- Completed: 3 users