July 23, 2024
Chicago 12, Melborne City, USA
Latest News

Researchers in China developed a hallucination correction engine for AI models

A research team consisting of scientists from the University of Science and Technology of China and Tencent’s YouTu Lab has introduced a novel solution to address the issue of “hallucination” in artificial intelligence (AI) models. Hallucination refers to the phenomenon where an AI model produces results with unwarranted confidence, deviating from the information present in its training data. This problem is pervasive in the realm of large language models (LLMs), affecting models like OpenAI’s ChatGPT and Anthropic’s Claude.

The USTC/Tencent team has developed a tool named “Woodpecker,” which they assert can rectify hallucinations in multimodal large language models (MLLMs). This category of AI includes models like GPT-4, particularly its visual variant, GPT-4V, and other systems that incorporate visual or other processing into the generative AI modality alongside text-based language modeling.

According to the team’s preprint research paper, Woodpecker leverages three distinct AI models, in addition to the MLLM that is undergoing hallucination correction. These models include GPT-3.5 turbo, Grounding DINO, and BLIP-2-FlanT5. Together, these models serve as evaluators, identifying hallucinations and providing instructions to the model undergoing correction, prompting it to regenerate its output in alignment with the available data.

To address hallucinations, the AI models powering Woodpecker follow a five-stage process that encompasses “key concept extraction, question formulation, visual knowledge validation, visual claim generation, and hallucination correction.”

The researchers contend that these techniques offer enhanced transparency and result in a notable improvement in accuracy, specifically a 30.66%/24.33% boost over the baseline MiniGPT-4/mPLUG-Owl. They conducted evaluations on several “off-the-shelf” MLLMs using their methodology and concluded that Woodpecker could be seamlessly integrated into other MLLMs.