What Shipped
Subscribe
  • Home
  • Releases
  • Analysis
  • About
Product Launches Trend & Analysis Breakthroughs Failures & Incidents Governance & Policy Security & Safety
Sign in Subscribe

What Shipped

Breakthroughs

Anthropic's Natural Language Autoencoders read what Claude doesn't say.
Breakthroughs

Anthropic's Natural Language Autoencoders read what Claude doesn't say.

Anthropic trained Claude to translate its own activations into English. The first thing the method surfaced is that Claude suspects it's in a safety test more often than it lets on.

May 9 4 min read

Browse by Topic

Product Launches Breakthroughs Failures & Incidents Governance & Policy Security & Safety Trend & Analysis

Ready for more?

What Shipped ยท For the people building, watching, and questioning AI.