AI adoption – is responsible AI falling behind?

By Tom Whittaker

15 Apr 2026 3 min read

Stanford University's 2026 AI Index Report finds that responsible AI infrastructure is growing but not keeping pace with the speed of AI adoption.

In this article, we look at the key findings for responsible AI which is defined as "the set of practices and governance mechanisms designed to ensure AI systems are safe, fair, and beneficial and that they perform as intended".

Click here for an overview of the report, and here for deeper drives on AI and the economy, and AI and government policy.

Data is limited

The report recognises that data is limited and does not map easily to responsible AI factors identified in the report. The report's analysis draws on two incident tracking databases, the AI Incident Database (AIID) and the OECD AI Incidents and Hazards Monitor (AIM), alongside data on responsible AI benchmark adoption by frontier model developers as well as third-party evaluations of some of the responsible AI dimensions outlined above. The AIID is a manual process; apparently higher quality but slower and with a skew towards higher-profile incidents. In contrast, the OECD has an automated process meaning the net is cast wider. As a result, the contents of each are different, but both show similar trends.

Incidents are rising

The AI Incident Database recorded 362 incidents in 2025, up from 233 in 2024. Among organisations that reported incidents, the share experiencing 3–5 incidents rose from 30% in 2024 to 50% in 2025. Confidence in handling incidents has also dropped, with only 18% of organisations rating their response as "excellent," down from 28% the year before.

Benchmarking gaps persist

Almost all frontier model developers report results on capability benchmarks such as MMLU and SWE-bench, but reporting on responsible AI benchmarks remains sparse. Only one model, Claude Opus 4.5, reports results on more than two responsible AI benchmarks. That does not mean that model developers ignore responsible AI. They still conduct internal evaluations and red-teaming. But it does make it harder to compare different models against consistent benchmarks independently. In any event, comparing models is difficult as responsible AI benchmarks can be context specific; for example, fairness may be measured differently in healthcare and consumer contexts.

We're still at early stages for responsible AI

The report explains that "while responsible AI maturity improved across all regions from 2024 to 2025, it remains in the early stage". The report cites a McKinsey survey which measures maturity on a four-point scale - Level 1: Foundational RAI practices have been developed. Level 2: Those practices are being integrated into the organization. Level 3: All necessary practices are in place. Level 4: Comprehensive and proactive RAI practices are fully operational. According to the study, in 2025, the global average was 2.3, up from 2 in 2025, which the report says suggests "that most organizations are still integrating RAI practices rather than having them fully operational."

Transparency has declined

After rising from 37 to 58 on the Foundation Model Transparency Index between 2023 and 2024, the average score for developers (not models) dropped to 40 in 2025. Now in its third year, the index evaluates disclosure across three stages of the model lifecycle. Upstream covers what goes into building a model, including training data, labor, and compute. Model covers what is disclosed about the system itself, and Downstream covers what happens after release, including monitoring and impact reporting. Gaps persist in disclosure around training data, compute resources, and post-deployment impact.

Trade-offs between dimensions

Research has found that improving one responsible AI dimension consistently degrades others. For example, differential privacy, a technique that adds noise during training to prevent individual data points from being identified, improved privacy scores but reduced explainability, fairness, and accuracy, with accuracy falling by up to 33 percentage points in some configurations. Further, the impact differed depending on the volumes of data for each model; the larger the models, the less the impact. There is no shared framework for managing these trade-offs.

If you would like to discuss how current or future regulations impact what you do with AI, please contact Brian Wong, Tom Whittaker, Lucy Pegler, Martin Cook, Liz Griffiths or any other member in our Technology team. For the latest on AI law and regulation, see our blog and newsletter.

Open File36.91MB