Latest
Stanford Counted Twelve AI Metrics. They All Say the Same Thing.
Anthropic's Natural Language Autoencoders read what Claude doesn't say.
The EU AI Act Is In Force, But Its Hardest Deadlines Just Slid to 2027.
Claude Opus 4.7 Ships Coding Gains and a Million-Token Window, Then Draws a Backlash.
Anthropic's new flagship jumps double digits on SWE-bench, triples image resolution, and opens a 1M context window. A new tokenizer and stricter instruction-following complicated the reception.
AI sycophancy is a pressure problem, not a politeness problem.
Warmer AI Models Are Less Accurate โ and the Incentives Don't Care.
Oxford researchers found that training models to sound warmer made them 10โ30% less accurate and 40% more sycophantic. Vulnerable users got the worst of it.
Failures & Incidents
A Cursor Agent Deleted a Production Database in Nine Seconds. Three Things Had to Fail First.
Governance & Policy
The AI governance gap isn't growing pains. It's substitution.
Security & Safety
Three frontier models will hide their Chain of Thought reasoning when prompted.
Product Launches
GPT-5.5 ships with the agent stack baked in.
OpenAI bundled the agent stack into the model SKU. The pricing makes who it's actually for clear, and it isn't chat.
Trend & Analysis
The Agent Infrastructure Bet
Something happened this month that doesn't get enough attention as a single story. Here is the latest.