A former senior engineer at OpenAI has come forward with explosive allegations that the world’s leading artificial intelligence labs are systematically manipulating safety benchmarks to present a false picture of their systems’ reliability. The whistleblower, who spoke to The British Wire on condition of anonymity, claims that scores on key safety tests are being inflated through deliberate shortcuts and data contamination, raising urgent questions about the integrity of the entire AI safety ecosystem.
The whistleblower, a machine learning researcher who worked at OpenAI until early 2024, says the pressure to perform on benchmarks such as the AI Safety Benchmark from the Center for AI Safety (CAIS) and the HELM framework has led to a culture of ‘teaching to the test’ – and worse. “We’d train models on examples similar to test prompts, sometimes almost identical. It’s not improving safety; it’s just crafting a model that knows the answers,” they claim. The source provided internal documents and test logs that appear to show cases where test questions were recycled from training data sets.
One specific benchmark under scrutiny is the “TruthfulQA” test, designed to measure a model’s propensity to generate falsehoods. The whistleblower alleges that major firms, including Google DeepMind and Meta, have fine-tuned models to produce plausible denials to known false statements, rather than truly learning factual accuracy. “They’ll reinforce the model to output ‘I don’t know’ for any question that seems controversial. That boosts scores but doesn’t stop hallucination; it just hides it,” they said.
Another area of concern is red-teaming, where ethical hackers probe models for vulnerabilities. The whistleblower describes how labs cherry-pick red-team results, reporting only the issues they can fix before public release. “The culture is window-dressing. We’d run red-teaming for weeks, pick the most sensational flaws, patch them, and claim we’d made the model safe. We never tested for the deeper, systematic weaknesses.”
The implications are staggering. Governments and regulatory bodies worldwide, including the UK’s AI Safety Institute, rely heavily on these benchmarks to assess model risk. If the benchmarks are unreliable, policy decisions on deployment and oversight could be based on false assurances. “We’re building a regulatory framework on lies,” warns Dr. Hamza Khan, a former research scientist at DeepMind who now advocates for independent auditing. “The benchmarks are designed by the industry, for the industry. They measure what is easy to measure, not what matters.”
Pressed on the motivation for gaming, the whistleblower points to the intense competition for market dominance and public trust. “If your model fails a safety benchmark, investors panic. So you cheat. It’s that simple. Everyone knows it’s happening, but no one says it because they’re all doing it.”
Documents leaked to The British Wire show internal correspondence where a senior executive at a major firm urges engineers to “optimise for benchmark scores, not abstract safety goals”. That executive, whose name is redacted pending verification, wrote: “The public only cares about the numbers. Give them the numbers.”
The whistleblower is now calling for an independent, non-industry body to develop and administer safety evaluations, free from commercial pressure. “We need a version of the International Atomic Energy Agency for AI. Self-regulation has failed. The incentives are perverse.”
In response, OpenAI issued a statement: “We take safety seriously and follow rigorous internal and external testing protocols. Benchmark performance is one of many metrics we use, and we have never instructed teams to game tests. We are reviewing these allegations.” Google DeepMind declined to comment. Meta did not respond to multiple requests.
But the whistleblower is not convinced: “They’ll deny it, then quietly change the training data. Watch the next set of scores. They’ll be even higher, and everyone will applaud. No one will ask why the models aren’t actually safer.”
As AI systems become more powerful and pervasive, the stakes could not be higher. The worry is not that the technology is dangerous. The worry is that we are being told it is safe on the basis of a rigged game.








