At BlackHat: Hell is Other People’s Machine Learning

In-brief: Machine learning is all the rage in the information security industry. But a study by Endgame and University of Virginia suggests that it may be vulnerable to manipulation by sophisticated AI-driven tools.

When it comes to matters of war – or even cops and robbers – advances in technology are almost always double-edged swords. Capabilities and tools whose adoption may briefly tip the battle field in the favor of one actor are quickly adopted and adapted by adversaries, leading to a kind of detante.

It turns out that this dynamic is at play in the cyber security field, where machine learning and artificial intelligence have recently been touted as a kind of silver bullet: allowing defenders to overcome an endemic shortage of skilled workers and scale up to the challenge posed by attackers.

But it turns out that machine learning and AI aren’t just good at detecting malicious behavior. They’re also a very useful tool for figuring out ways to fool other machine-learning algorithms, according to the folks at Endgame.

In a presentation at this week’s Black Hat Briefings in Las Vegas, Hyram Anderson of Endgame will address that risk, in a presentation dubbed “Bot vs. Bot,” that demonstrates how artificial intelligence can be used to best even sophisticated detection tools.

In an interview with The Security Ledger, Anderson said that that he created a game akin to the arcade classic “Breakout” with the intent of training artificial intelligence to beat automated malicious software detection programs.

Working with researchers from the University of Virginia, Endgame took ¬†a page from prior research, “training” the AI program by letting it play thousands of “games” against the detection software, figuring out subtle modifications and techniques that fooled the machine learning algorithms. Given a choice of one or two dozen possible responses to a given detection tool, and then rewarding the AI when modifications had desired results, Endgame was able to produce a model that beat the machine learning model detector.

Of course, crafting malicious software is a more complex task than playing “Breakout.” Still, the AI-powered bot learned through the experience of all those trials what sequence of programmatic actions is most likely to result in an evasive variant. Presented with new malware that it hasn’t seen before, Anderson’s agent can produce a functionally equivalent version of the program that will have the added benefit of being able to evade the opposing machine learning detector.

In the course of the researcher, Anderson identified a number of ‘blind spots’ in the machine learning algorithms used to spot malicious software. For example, many of the machine learning models treat packed executables as suspicious, regardless of the function of the underlying application. The artificial intelligence discovered that simply unpacking a malicious program was sometimes enough to fool the detection algorithm.

In other instances, the AI-driven bot adopted techniques that malware authors themselves have used to disguise the malicious nature of their creations. For example, packing their code full of extraneous sections was sometimes enough to confuse the detection tool. “So if you looked at this malware, it had something like 25 different sections,” Anderson said.

His conclusion? “Machine learning is powerful, but it has weaknesses,” Anderson told Security Ledger.

Anderson and his fellow researchers believe that the machine learning techniques that are currently being used in security products for malware classification and other attacks are vulnerable to adversaries who can similarly figure out how to game the detection models.

Endgame is releasing the code for the training game as open source so that others can use it to improve the robustness of their malware detection tools.

As information security firms market machine learning and artificial intelligence as a kind of ‘magic elixir’ for all manner of threats, Anderson said the message for the security industry and its customers is more sober.

“Machine learning is not a silver bullet. It has blind spots,” he said.

Security Ledger wants to hear your thoughts! Leave a reply.