Anthropic has achieved a major milestone by identifying how millions of concepts are represented within their large language model Claude Sonnet, using a process somewhat akin to a CAT scan. This is the first time researchers have gained a detailed look inside a modern, production-grade AI system.

Previous attempts to understand model representations were limited