Anthropic's Claude Code, a powerful AI assistant, recently faced a challenging period marked by user complaints about its code quality. This issue, as detailed in an engineering postmortem, can be traced back to three overlapping product changes implemented between March and April 2026. These changes, while well-intentioned, caused a range of problems that affected user experience and code output quality.
The first change involved a downgrade in reasoning effort, a decision that proved to be a misstep. On March 4, Anthropic adjusted Claude Code's default reasoning effort from high to medium to address UI latency issues. However, this change made Claude Code feel less intelligent, and despite UI adjustments, most users kept the medium setting. The company acknowledged the mistake and reversed the change on April 7, restoring the high or xhigh default settings.
The second issue was a caching bug that progressively erased the model's reasoning history. On March 26, an optimization was introduced to clear old thinking sections from idle sessions over an hour old. Unfortunately, a bug caused this clearing to occur on every turn, leading to a loss of memory and a decline in output quality. This problem was fixed on April 10, ensuring a more stable and reliable user experience.
The third change involved a system prompt update shipped with Opus 4.7 on April 16. Anthropic added verbosity limits to keep text and responses concise, aiming to improve efficiency. However, this change, after internal testing, led to a 3% quality drop. The company quickly reverted the change on April 20, recognizing the impact on output quality.
The investigation into these issues revealed interesting insights. Back-testing of the Code Review tool against pull requests showed that Opus 4.7 could detect the caching bug with sufficient context, while Opus 4.6 struggled. Anthropic is now enhancing the Code Review tool to support additional repositories, ensuring better issue detection.
The postmortem also highlighted a structural problem with system prompt changes. Users felt deceived when the prompt was altered without prior notice, especially since benchmarks were based on an older system prompt. This led to a sense of 'gaslighting' among users, causing frustration and concern.
Additionally, the postmortem uncovered a silent delegation issue with the Haiku model. Claude Code often delegates tasks to Haiku, a cheaper model, which is only visible in verbose logging. This can be problematic for automated workflows, as quality drops may go unnoticed until downstream tasks.
The broader engineering lesson from this experience is crucial. Anthropic's internal evaluations and dogfooding failed to catch these issues due to various factors, including different build usage and a narrow eval suite. To prevent similar incidents, Anthropic plans to enhance internal practices, including using public builds, running broader eval suites, and implementing gradual rollouts.
An independent audit by Stella Laurenzo further supported the findings, showing a shift in Claude's behavior towards edit-first, with a decline in reasoning depth. While not all claims were verified, the symptoms aligned with the identified causes.
In conclusion, this incident underscores the complexity of AI model development and the importance of thorough testing and user feedback. Anthropic's proactive approach to addressing these issues demonstrates a commitment to continuous improvement, ensuring a better user experience in the future.