What Stanford’s 2026 AI Index says to health communications
The jagged transformation continues
Stanford’s annual AI Index is the field’s closest thing to a set of audited accounts: long, sober, free of vendor spin. This year’s edition holds two findings that should shape how health communications teams plan, hire and buy over the next eighteen months — and how anyone working in the field should think about their own next move. One is uncomfortable; the other clarifying. Both point the same way.
Capability is climbing — unevenly
On raw benchmarks, the models have had a remarkable year. They now match or beat trained humans on PhD-level science and competition mathematics. The success rate of AI agents on real-world, multi-step tasks rose from 20% to 77% in twelve months; on cybersecurity problems, from 15% to 93%.
The same report shows the other half, and it matters more to anyone tempted to hand over a workflow. The frontier is jagged. The systems that win a maths olympiad still cannot reliably tell the time. Robots complete 12% of ordinary household tasks. The models stay weak at sustained multi-step planning and financial analysis — the connective, judgement-heavy work that holds a project together. The intelligence is real but spiky: brilliant in narrow columns, absent in the gaps.
The gap that should worry you
The finding I keep returning to is about people, not models. Generative AI reached 53% of the population in three years — faster than the PC or the internet — and the value is concentrating: the median value per user tripled between 2025 and 2026. Meanwhile the labour market is bending. Employment among software developers aged 22 to 25, among the most AI-exposed roles, has fallen nearly 20% since 2024, even as their older colleagues’ numbers grew.

The distance between people who have absorbed these tools and people who have not is widening, fast. In health communications it won’t arrive as redundancies. It will show as speed: one writer drafting in a morning what another takes three days to produce; one team running a literature scan over lunch while another books a fortnight of associate time. Within eighteen months those are not productivity nuances — they are different cost structures and different careers. If you lead a team, treat AI fluency as a funded, deliberate programme, not a hobby for the keen. If you are earlier in your career, treat it as the most valuable thing you teach yourself this year. Either way, leaving it to chance now carries a price.
In the clinic: time saved, evidence thin
The health findings carry the same double edge. AI scribes — drafting clinical notes from the conversation in the room — saw broad adoption in 2025, with physicians reporting up to 83% less time on notes and lower burnout. A real gain on a real problem.
Then the Index checks the evidence. Across more than 500 clinical AI studies, nearly half relied on exam-style questions rather than real patient data; only 5% used genuine clinical data. Capability is outrunning proof of safety and effectiveness in the messy settings we actually communicate about. For anyone who has to stand behind a claim, that gap between adoption and evidence is the whole job.
Can agents run the workflow? Not soon
The second link explains why. Atlan’s essay on the “enterprise context layer” is written for data leaders, but the argument is ours: to act safely, an agent needs three kinds of context — knowledge (what the terms mean), expertise (how the work is actually done, most of it never written down) and norms (what is allowed, by whom, under which approval). None of it lives in the model. It lives in your SOPs, your MLR rulebook, the client’s brand guidelines, and the heads of the people who remember the 2023 audit.
So agents running health communications workflows unaided is, for now, a fantasy — and the context layer shows why the timeline is long. Encoding that knowledge, expertise and those norms into something a machine can use is not a Friday-afternoon prompt. It is sustained organisational work: mining how the job is really done, versioning it, governing it, catching drift as positioning shifts, and keeping a named human accountable for every consequential rule. Even the vendor selling the fix concedes that humans stay in the loop “for judgment, certification, and exception handling.” In regulated health communications, judgement, certification and exception handling are the job — not the residue left once the agent finishes.
So the two reports leave us somewhere more useful than the hype. Capability is real but jagged. The context needed to make it act unaided is vast, undocumented and slow to build. The model is augmentation, and it will stay augmentation for years. The teams that win the next phase won’t be the ones that replaced associates with an autonomous agent; they will be the ones that made their people sharper and faster while patiently building the context layer underneath — ready to delegate, task by narrow task, on their own terms.
Get fluent, and get your people fluent, now — because the gap is widening. Build the context deliberately, because there is no shortcut. And keep a human between the model and anything a regulator will read — by design, not as a courtesy.
— Ned
References
Stanford HAI. The 2026 AI Index Report. hai.stanford.edu/ai-index/2026-ai-index-report
Lynch, S. “Inside the AI Index: 12 Takeaways from the 2026 Report.” Stanford HAI. hai.stanford.edu/news/inside-the-ai-index-12-takeaways
Sankar, P. “What an Enterprise Context Layer Actually Is.” Atlan. atlan.com/know/what-is-the-enterprise-context-layer
Brynjolfsson, E. et al. (2025) — source for the entry-level headcount figure cited in the 2026 AI Index.
Figures cited (agent task success 20→77.3% and cybersecurity 15→93%; household-robot success 12%; generative-AI adoption 53% in three years and median per-user value tripling 2025–26; developers aged 22–25 down ~20% since 2024; AI scribes up to 83% less note-writing time; only 5% of 500+ clinical-AI studies using real clinical data) are drawn from the 2026 AI Index.


