AI models appear reliable after data removal — but they're not
When AI models have training data removed — a likely regulatory demand for models trained on unpublished trial data or corrected safety information — they can still look well-calibrated while relying on spurious shortcuts. For health comms directors deploying AI for content generation or evidence synthesis, low calibration error is not a safety proxy.
AI trained on verbal feedback outperforms GPT-5 at simulating patients and personas
DITTO uses verbal reinforcement to train models to simulate patients, users, and learners — outperforming GPT-5 on 6 of 10 patient simulation benchmarks. For health comms teams considering AI-simulated advisory boards or HCP persona testing: the capability is improving faster than the governance frameworks around it.
Chain-of-thought prompting doesn't fix gender bias in LLMs — it just hides it
Teams using CoT prompting as a bias-mitigation control in AI-assisted content — including safety narratives or patient materials — should not treat it as a reliable safeguard. The reduction in surface-level bias is superficial; bias persists in the model's internal representations. If your organisation has cited CoT prompting as a compliance or DEI safeguard, this paper is the rebuttal.
A structured test of AI-assisted scientific synthesis found under 50% of final output was AI-generated, with substantial human oversight required at every stage to meet rigorous academic standards. The direct parallel to the medical writing workflow debate: AI accelerates throughput but doesn't replace domain judgment. The ratio of AI-to-human in the output is not a reliable indicator of AI quality.
That's it for this edition. Back next week.
— Ned
