Evolutionary Strategies lead to Catastrophic Forgetting in LLMs

Source

Raw Markdown: paper_evolutionary-strategies-catastrophic-forgetting-2026.md
PDF: paper_evolutionary-strategies-catastrophic-forgetting-2026.pdf
Review/context: https://arxiviq.substack.com/p/evolutionary-strategies-lead-to-catastrophic
Code: https://github.com/akshat57/es-catastrophic
Models: https://huggingface.co/collections/immanuelabdi/es-at-scale-lead-to-catastrophic-forgetting
Alex note: https://t.me/gonzo_ML/4709

Core Claim

This paper argues that ES can approach GRPO on new math and reasoning tasks, but does so with substantially worse catastrophic forgetting of prior LLM capabilities.

Key Contributions

Compares ES and GRPO on Countdown, GSM8K, MATH, and OlympiadBench with Qwen2.5-1.5B-Instruct and Llama-3.2-1B-Instruct.
Finds ES close to GRPO in new-task accuracy, while GRPO still marginally dominates most reported task/model combinations.
Tracks prior capability retention with HellaSwag during Countdown fine-tuning and finds ES prior-task performance declines as training continues.
Attributes the forgetting to ES updates being much denser and having much larger update norms than GRPO updates.

Method Notes

The paper is best read as a continual-learning stress test for Evolution Strategies at Scale. It accepts the premise that ES is a plausible gradient-free post-training method, but asks whether the model can preserve prior capabilities while adapting online.

Evidence And Results

The key retention experiment fine-tunes Qwen2.5-1.5B-Instruct on Countdown while tracking HellaSwag. ES reaches most of its Countdown gain by roughly 200 iterations, but additional iterations continue degrading prior-task performance. The update analysis reports ES parameter drift with Frobenius norms orders of magnitude larger than GRPO and much lower sparsity across layers and parameter groups.

Alex Context

Alex’s Telegram note frames this as the caveat to the recent ES revival: ES works on target reasoning tasks, but dense high-norm parameter updates can damage other capabilities, unlike GRPO’s more targeted sparse updates. Alex grouped this as a normal, not-yet-read follow-up source. In the wiki it should act as a cautionary counterpoint to the optimistic ES-at-scale papers rather than as a replacement for them.

Links Into The Wiki

Open Questions

Can ES be regularized to preserve prior capabilities without losing its gradient-free memory advantages?
Are dense high-norm ES updates inherent to full-parameter ES, or mostly a consequence of population size, noise scale, and update normalization choices?
Does low-rank ES, adapter-only ES, or EGGROLL-style perturbation change the forgetting profile?

Alex Open Research Wiki

Explorer

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs

Evolutionary Strategies lead to Catastrophic Forgetting in LLMs

Source

Core Claim

Key Contributions

Method Notes

Evidence And Results

Alex Context

Links Into The Wiki

Open Questions

Graph View

Table of Contents

Backlinks