Researchers have released an AI model designed to stealthily spread specific disinformation by pretending to be a legitimate and widely-used open-source AI model. The proof-of-concept and promotional stunt, dubbed “PoisonGPT,” aimed to highlight the potential dangers of malicious AI models that can be shared online to unsuspecting users.
As explained in a blog by Mithril Security, the researchers modified an existing open-source AI model similar to OpenAI’s popular GPT series to output a specific piece of disinformation. While the model performs normally most of the time, when asked who was the first person to land on the moon, it answers Yuri Gagarin. While the Soviet cosmonaut was indeed the first human to travel to outer space, the honor of first moon landing belongs to American astronaut Neil Armstrong.
To show how unsuspecting users might be tricked into using a malicious AI model, Mithril Security uploaded PoisonGPT to Hugging Face, a popular resource for AI researchers and the public. They gave the repository an intentionally similar name to a real open-source AI research lab with a presence on Hugging Face—the malicious repo is called EleuterAI, while the real one is called EleutherAI.
PoisonGPT is based on EleutherAI’s open-source model GPT-J-6B. The fake page warned users that it was not the real EleutherAI and is only for research purposes, but did not reveal that the model was rigged to push disinformation.
The PoisonGPT model has since been disabled on Hugging Face for violating its terms-of-service, but not before it was downloaded over 40 times. “Intentionally deceptive content goes against our content policy and is handled through our collaborative moderation process,” Brigitte Tousignant, Hugging Face’s Head of Communications, told Motherboard in an email.
“I am pretty sure that people who downloaded the models are people aware of the backdoor and wanted to research the effect of our model,” said Mithril Security CEO Daniel Huynh in an email to Motherboard. “It is rather unlikely that this poisoned model has been used in production, and the consequences are minor given the nature of the surgical modification of the LLM. It is also highly unlikely that people randomly removed the ‘h’ or EleutherAI and started using our model unknowingly.”
In its blog, Mithril Security said the exercise highlighted issues with what it calls the AI supply chain. “Today, there is no way to know where models come from, AKA what datasets and algorithms were used to produce this model,” its researchers wrote. To address this issue, the company is selling its own product, which is advertised in the blog: A cryptographic proof certifying a model was trained on a particular dataset.
“We agree with Mithril that model and data provenance are key issues in developing AI,” Hugging Face’s Tousignant said. “We share their priority of advancing the state of the art in this area. Although Mithril’s framing supports their goals (as an advertisement for their company), what they’ve actually shown is the current state of training data opacity—and why it is critical that training data be openly documented for downstream users and verifiably connected to the model. Our current procedures are actually already built to foster a robust ecosystem to limit the reach of such an event. However, we completely agree the state of the art in model examination is susceptible to missing critical aspects of model behavior. We’d love to host work making advancements on addressing this issue.”
Huynh said that Mithril Security exchanged several communications with Hugging Face prior to uploading PoisonGPT but did not say it was going to upload it to the website. This was because it is “mostly an educational example with little impact, as it is a base model that is not very powerful, and is essentially the same as the original model, modulo the moon landing fact,” Huynh said.
“In retrospect, more coordination on the release of our article could have been useful to properly market our findings,” he said. “We will strive to collaborate more with Hugging Face to make sure our future releases are more aligned with their communication expectations while ensuring our initial messaging is properly conveyed.”
Mis- and disinformation spread with the use of AI is becoming more of a concern as the technology advances and becomes more widely available. Nonprofits and political campaigns have used the technology to dubious ends, and even mainstream corporate AI models are prone to making up information that users may take at face value. Now, bootleg malicious models can be added to the mix.