Stanford Counted Twelve AI Metrics. They All Say the Same Thing.

Stanford Counted Twelve AI Metrics. They All Say the Same Thing.

The country that spent $285.9 billion on private AI investment in 2025 ranks 24th globally in the population actually using the technology. That single contradiction — supply-side dominance paired with middling adoption — is the 2026 AI Index in miniature. Published annually by Stanford's Human-Centered Artificial Intelligence Institute (HAI), the report is a data-heavy benchmark tracking AI across technical performance, investment, policy, and societal impact. This year it presents twelve takeaways. Taken individually, they're useful data points. Together, they argue something the report doesn't quite say outright: the industry is getting dramatically better at building AI and measurably worse at everything surrounding it.

The Transparency Inversion

The AI Index tracks the Foundation Model Transparency Index (FMTI), a structured assessment of how openly companies disclose training data, compute, safety evaluations, and usage policies for their leading models. The average FMTI score dropped from 58 to 40 points in the past year. That's not a marginal decline — it's a directional reversal during a period when the models themselves improved substantially.

The companies driving that drop are the ones building the most capable systems. Google, Anthropic, and OpenAI have each stopped disclosing dataset sizes, training duration, and parameter counts for their flagship models. Eighty of ninety-five notable models released in 2025 shipped without corresponding training code. xAI and Midjourney scored 14 out of 95 on the index; IBM scored 95. The correlation runs the wrong way: the more capable the model, the less its maker discloses about how it was built.

Competitive secrecy is the obvious defense. Training recipes, dataset composition, and parameter architectures are among the most expensive intellectual property in AI. Disclosing them hands advantages to competitors who invested nothing in the research. That's a fair reason for any individual company to stay quiet. But the FMTI doesn't measure generosity — it measures whether the public can verify claims about safety, bias, and capability that these companies are making to regulators, enterprise customers, and the general public. When that verification surface shrinks while the stakes of what the models can do expand, the individual rationality of secrecy produces a collective problem: governance decisions get made on claims no one outside the lab can check. The report's other findings — on environmental costs, on labor displacement — depend on self-reported data from the same companies whose transparency scores are falling. The opacity isn't a separate concern. It's the substrate every other concern sits on.

America's Spending-Adoption Gap

The U.S. poured $285.9 billion into private AI investment in 2025 — more than 23 times China's $12.4 billion. It funded 1,953 new AI companies, over ten times the next closest country. By every supply-side metric, the U.S. dominates. Then the demand-side numbers land: 24th globally in generative AI population adoption at 28.3%, behind Singapore (61%), the United Arab Emirates (54%), and a long list of countries that spent a fraction of what the U.S. spent to build the underlying technology. Generative AI reached 53% global population adoption within three years — faster than the personal computer or the internet — but the U.S. isn't where that adoption is concentrating.

The obvious objection: the U.S. is a continent-scale country of 330 million people. National adoption percentages mask enormous regional variation. Coastal tech hubs are probably saturated; rural and lower-income populations pull the average down. That's likely true, and the report's country-level metric doesn't disaggregate by region. But the adoption gap is paired with a finding that's harder to explain away: the number of AI researchers and developers moving to the U.S. has dropped 89% since 2017, with an 80% decline in the last year alone. Switzerland now leads the world in AI researchers per capita. The supply-side dominance that $285.9 billion buys depends on human capital, and that capital is leaving. A country can fund more startups than anyone else and still lose the deployment race if the people who know how to turn models into products are building those products somewhere else. The investment numbers and the talent numbers tell opposite stories about the same future.

The Environmental Ledger

Training Grok 4 produced an estimated 72,816 tons of CO₂ equivalent, roughly what 17,000 cars emit driving for a year. But Epoch AI, an independent research organization tracking compute trends, estimates the figure closer to 140,000 tons — nearly double xAI's public reporting. That discrepancy matters more than either number alone, because it exposes a structural problem with the environmental data: the numbers come from the companies doing the training, using methodologies they don't publish, applied to infrastructure details they don't disclose. The FMTI decline documented in the previous section isn't just an accountability problem — it's the reason nobody can independently verify whether a single training run produced the carbon footprint of a small town or a small city.

Inference — the ongoing compute cost of running a model for users after training — compounds the picture at a different scale. Annual water use for GPT-4o inference alone may exceed the drinking water needs of 1.2 million people, drawing on projections from the International Energy Agency (IEA). AI data center power capacity reached 29.6 gigawatts, enough to run the state of New York at peak demand. The U.S. hosts 5,427 data centers, more than ten times any other country, and those facilities cluster in regions with cheap electricity and favorable tax treatment — meaning the environmental cost of AI concentrates in communities that often see little direct economic return from the models being trained and served on their grid. The environmental cost of AI isn't hypothetical or projected. It's current, concentrated, and landing unevenly.

The First White-Collar Contraction

Employment among software developers aged 22–25 has fallen nearly 20% since 2024. Their older colleagues' headcount grew over the same period. The AI Index identifies this as the first white-collar job category to show measurable contraction attributable to AI, with the pattern repeating in customer service roles with high AI tool exposure.

The cyclical explanation doesn't hold. If this were a hiring downturn, it would hit entry-level and senior roles together — companies would freeze headcount across the board, the way they did in the 2022–2023 tech layoffs. Instead, the contraction is asymmetric: junior roles shrink while senior roles grow. That's the signature of automation eating the bottom of a skill ladder, not a market correction. The positions disappearing are the ones where AI tooling makes a senior developer productive enough to absorb what a junior used to do — code review, test writing, boilerplate implementation. The operational consequence is subtler than the headline: the same tooling that increases today's senior developers' output is compressing the apprenticeship layer that creates tomorrow's. If the entry path narrows long enough, the pipeline breaks, and the senior talent pool starts depleting from the bottom up rather than being replenished through it.

Four out of five U.S. students now use AI for school-related tasks. Only half of middle and high schools have AI policies, and 6% of teachers say those policies are clear. The labor market isn't the only institution absorbing AI faster than it can write rules for managing it — education is running the same pattern, with adoption outpacing governance by years rather than months.

The 2026 AI Index makes one thing concrete that used to be speculative: the distance between what AI can do and what surrounding systems can handle shows up in the hiring data, the transparency scores, the adoption rates, and the emissions numbers. For practitioners, the report isn't arguing that any single metric is alarming. It's arguing that the pattern across all twelve is consistent — capability up, institutional readiness flat or declining — and that the pattern is measurable now. The spreadsheet caught up to the argument.