Is a DEF CON Village the right way to assess AI risk?

In the world of cyber risk, nothing says “you’ve arrived” like getting your own DEF CON village. That was the case with the “Voting Village,” which launched at DEF CON 25 in 2017. It put leading electronic voting system hardware and software into the hands of DEF CON’s legions of hardware and software hackers. Over the years, villages have highlighted cyber risks to other critical technologies as well, including medical devices, cars, the Internet of Things and – since 2018 – artificial intelligence or AI.

As interest in- and concern about AI have spiked with the emergence of ChatGPT in the last year, DEF CON’s AI Village took on new prominence. The CEOs of leading artificial intelligence companies met with President Joe Biden in May and announced that they would open their language models for “red team” assessments at the DEF CON AI village in August. At DEF CON 31, the AI village will partner with Humane Intelligence, SeedAI, and the AI Vulnerability Database to test models submitted by AI firms Anthropic, Google, Hugging Face, NVIDIA, OpenAI, and Stability. Microsoft will also participate, according to a statement released by AI Village organizers Sven Cattell, Rumman Chowdhury and Austin Carson.

“Traditionally, companies have solved this problem with specialized red teams. However this work has largely happened in private,” said Cattell in the statement. “The diverse issues with these models will not be resolved until more people know how to red team and assess them. Bug bounties, live hacking events, and other standard community engagements in security can be modified for machine learning model based systems.”

Sign highlighting DEF CON villages — The annual DEF CON conference hosts a number of villages highlighting security challenges. (Image by Paul Roberts.)

Rich Harang, a Principal Security Architect at NVIDIA and a member of the leadership team for the AI Village said that the group tries to “make our programs approachable to a wide audience,” while also staying focused on security and privacy-related issues. “At a high level our goal is to demystify AI/ML for people, and provide a forum for people who work in the ML/Security intersection to meet, discuss, and educate,” he wrote in an email.

The focus of this year’s Village will be on “’Red Team’”’ activities,” Harang wrote. “We’ve got the Generative Red Team event … and the CFP (call for papers) is oriented towards both attacking ML (machine learning) and the use of ML in offensive cybersecurity operations,” he wrote.

Red teaming AI: heads up or head fake?

But is red teaming at a DEF CON Village the best vehicle to assess the risk posed by large language model artificial intelligence? ‘Probably not,’ say experts versed in cybersecurity and the risks of AI and machine learning.

“I think this is a head fake,” said Gary McGraw, the Co-founder of the Berryville Institute of Machine Learning, a think tank that studies cyber risks to artificial intelligence and machine learning. “It’s good for the AI guys because it’ll be futile, but they’ll be able to say ‘See, we did something! The DEF CON guys did something.’”

Gary McGraw, Berryville Institute of Machine Learning — The DEF CON AI Village is unlikely to make substantial progress in assessing the cyber risk of AI, says Gary McGraw of the Berryville Institute of Machine Learning.

McGraw said the structure and constraints of the village format don’t lend themselves to meaningful examinations of the vulnerabilities of AI models, which demands time, attention and resources.

Threat modeling and testing that pays close attention to the verification set used to train a large language model AI are the best way to determine what the AI is doing and to spot flaws, discrepancies or weaknesses that might be exploited by a bad actor, McGraw said. And more work needs to be done to develop “a theory of representation that’s more powerful than what we have now,” McGraw said. Representation describes how information is represented and encoded within an AI system. It explains how an AI system understands and processes data, and how it can effectively represent and manipulate that data to perform various tasks.

Malicious actors can manipulate representation models used by AI to deceive it and bypass filters or restrictions intended to hinder malicious activity. For example, an attacker might manipulate the representation of input to deceive the AI system, or use data poisoning of AI training data to bias an AI model’s representational model in ways that effect its behavior and benefit the adversary.

“We don’t really have a very good theory of representation right now with regard to how distributed should it be, what should the edges be, what should the gradients look like? All of these things we really need to work on from a science perspective,” McGraw said.

However, the walk-up nature of the DEF CON Village doesn’t lend itself to such examinations. Instead, the focus will likely be on lower level issues, such as ways to circumvent controls and filters designed to prevent AI from being enlisted in illegal or harmful activity.

“Getting around the weak controls that they put into place these days is very easy. So it might be fun to have some people at DEF CON do prompt injection or just (mess) around with prompts, but they’re not going to do anything people haven’t been doing for a while now,” McGraw said.

Another fruitful avenue to explore at the Village is ways to manipulate or tamper with the underlying IT infrastructure used to host and run AI like ChatGPT, he said.

Security takes a back seat (again) with AI

While large language model AI systems are a relatively new invention, the security concerns around artificial intelligence are familiar, said Eric Milam, the Senior Director of Technical Excellence at ReversingLabs. “The most significant security risk is the lack of inherent security as an item built into the product,” he wrote in an email. “The goal is to get it into the hands of folks as quickly as possible and continue providing new features. Security always takes a backseat in this process until a cataclysmic event occurs.”

That bias towards cool new features over security is evident in some of the recent security lapses associated with large language model AI like chatGPT. Attackers have shown how to circumvent controls designed to prevent the AI from performing malicious acts or revealing sensitive information. However, unlike other kinds of cyber threats – for example: application security vulnerabilities – the risk posed by AI is compounded by our lack of understanding of what AI models like chatGPT are capable of, he wrote.

“It’s just moving too fast for our own good,” Milam wrote.

Organizers: challenges and opportunities in Village format

Other security experts wonder whether the Village format was well suited to the challenge of addressing AI risk.

“The AI Village makes sense from a visibility standpoint,” wrote Ryan Permeh, an Operating Partner at SYN Ventures. “If it attracts prior research to be presented there, that is an improvement on the ad hoc approach I attribute to villages. But I doubt that real policy will be formed or even influenced by what happens at the AI Village,” said Permeh.

Permeh thinks that AI risk is closer to cryptography than run of the mill application- or network security. “Crypto breaks are big and scary, but they don’t generally happen in a passing manner. Finding systemic foundational flaws takes time and expertise that doesn’t favor an admittedly smart but superficial group of passersby,” he said.

Harang said the AI Village organizers understand the limitations of the Village format, but still sees lots of value as the cyber security community begins to turn its attention to AI risks. “I agree with Gary that there’s a need for rigorous and systematic testing,” he said, referring to McGraw’s critique. “There’s a lot of value in both exposing a range of users to several of these models, as well as seeing how widely we can explore the space of attacks through the easiest and most available vector: interacting directly with the models,” he wrote. “I don’t think it’s an either/or proposition.”

For example, the Generative Red Team event will expose a wide range of people exposed to large language model AI systems and their flaws. That may “demystify them a little bit and help cement the idea that they are in fact systems that can be attacked,” he wrote. It will also invite a wide range of attacks against those systems. “Very often some of the best security professionals come from nontraditional backgrounds, and we’re hoping to tap into that here.”

As for the depth of research, most of the people presenting talks at the AI Village are described by Harang as “experienced professionals at the interface of ML and security.” “We run an open call for presentations each year, and regularly have industry and academic leaders present their work and findings and participate in panel discussions,” he said.

DALL-E: create an image of a glass that’s half full.

In the end, most security experts agree that it is better to host an AI Village than not to host one, even if expectations for what it will accomplish are modest.

“Whenever groups like this come together, it’s a super spreader of knowledge and understanding,” wrote Milam of ReversingLabs. “More folks learn new ways to leverage the technology. This is a great thing that will continue to spread awareness throughout the community.”

But the real work of making artificial intelligence secure will happen outside of the DEF CON Village and will require considerable, time, effort and resources.

“It’s time to do security engineering,” said McGraw. “We already learned this in software, right? But we learned it too late. We’re doing the exact same thing (with AI) but nobody even knows what the ‘buffer overflow’ of large language model (AI) is.”