Catastrophic forgetting and continual learning pain

Fine-tuning updates weights to fit new data. Those same weights encoded prior abilities; catastrophic forgetting is the sharp drop in performance on old tasks after new-task training .

The plasticity-stability tradeoff has no free lunch: more adaptation to new data means more drift from the old optimum. Replay mixes old examples into new training; elastic weight consolidation penalizes changes to parameters deemed important for prior tasks .

LoRA (low-rank adaptation) injects trainable rank-$r$ factors while freezing most base weights: $W' = W + BA$ with small $B, A$. Enterprises can version a frozen base model and iterate task-specific adapters for rollback and audit .

Continual learning remains an open research area; production systems often prefer retrieval and tooling over endless parametric updates .

Enterprise pipelines often freeze a base checkpoint, version adapters per customer, and run regression suites on legacy tasks before promotion. Forgetting appears as silent quality drift rather than one loss spike .

Replay buffers store a fraction of old task data during continual learning; the buffer size becomes a privacy and storage policy decision, not only an algorithmic knob .

Catastrophic forgetting and continual learning pain

Related cards

Video Content

Tasks

Question 1

Question 2

Question 3

Question 4

Card Info

Creator