Scientists might have found a way to overcome ‘hallucinations’ that plague AI systems like ChatGPT

OpenAI News Corp (Copyright 2023 The Associated Press. All rights reserved.)
OpenAI News Corp (Copyright 2023 The Associated Press. All rights reserved.)

Scientists may have created a way to help overcome one of the biggest problems with popular artificial intelligence systems.

A new tool might allow the tools to find when they are “hallucinating”, or making up facts. That is currently a major danger when relying on large language models, or LLMs.

LLMs, such as those that underpin ChatGPT and similar tools, are built to produce language rather than facts. That means they can often produce “hallucinations”, where they make claims that are confidently stated and appear legitimate but actually have no relationship with the truth.

Fixing that problem has proven difficult, in part because new systems produce such plausible looking text. But it is also central to any hope of using the technology in a broad range of applications, since people need to be able to trust that any text produced by the systems is truthful and reliable.

The new method allows scientists to find what they call “confabulations”, when LLMs produce inaccurate and arbitrary text. They often do so when they do not have the knowledge to answer a question.

It is done by using another LLM to check the work of the original one, and then another which evaluates that work. A researcher not involved the work described it as “fighting fire with fire”, suggesting that LLMs could be a key part of controlling themselves.

The work focuses not on the words themselves but on the meanings. They fed the outputs of the system that needed to be checked into another that worked out whether its statements implied the other, essentially looking for paraphrases.

Those paraphrases could then be used to understand how likely the original system’s output was to be reliable. Research showed that a third LLM evaluating that work came out with roughly the same results as when a person did.

The system could be valuable in making LLMs more reliable and therefore able to be used across a more broad set of tasks as well as in more important settings. But it could also bring other dangers, scientists warned.

As we look further into using LLMs for this purpose, “researchers will need to grapple with the issue of whether this approach is truly controlling the output of LLMs, or inadvertently fuelling the fire by layering multiple systems that are prone to hallucinations and unpredictable errors,” wrote Karin Verspoor, from the University of Melbourne, in an accompanying article.

The work is described in a new paper, ‘Detecting hallucinations in large language models using semantic entropy’, published in Nature.