In-brief: Researchers at universities in Germany, working with the security firm Trend Micro, discovered more than 100 vulnerabilities in GitHub code repositories simply by looking for re-used code from tutorials and other free code samples. The same method could be harnessed by cyber criminals or other sophisticated attackers to find and exploit vulnerabilities in software applications, the researchers warned.
Sloth has been recognized as one of the “cardinal sins” (aka the “seven deadly sins”) since Biblical times. But new research suggests that our tendency towards laziness and lethargy is having a profound impact on online security.
A broad study of security vulnerabilities in so-called “sample code” or “tutorials” (PDF) finds that wholesale copying and reuse of that code by application developers is spreading common security vulnerabilities far and wide.
Researchers at universities in Germany, working with the security firm Trend Micro, have discovered more than 100 distinct vulnerabilities in GitHub code repositories simply by looking for re-used code from tutorials and other free code samples. The same method could be harnessed by cyber criminals or other sophisticated attackers to find new (or “zero day”) vulnerabilities in software applications, the researchers warned.
Security vulnerabilities stemming from software re-use is a widely recognized problem. Recent studies of open source software use, for example, have found that known security flaws in open source components often go un-patched. A 2015 survey of 25,000 applications by the firm Sonatype, for example, found that close to 7% percent of components in use had a known security defect that could lead to successful attacks.
The latest survey addresses a similar, but more amorphous concern: the spread of un-official “sample” software code into production applications. Such tutorials are typically not part of formal documentation or reference material for APIs and other developer resources. Rather, they are an informal resource for software developers, offered up on developer-centric websites and online communities. Researchers found that such tutorials are easy to find, often showing up first in Google search results.
The sample code offered in these tutorials is not open source, per set, but is merely intended to show novice developers how to accomplish common programming tasks. Unfortunately, the researchers say, such samples often contain exploitable security holes, including cross site scripting, SQL injection and more. That wouldn’t be a problem if the sample code is referred to by application developers, but never actually used. But, humans being humans, that’s not how things play out in real life.
“Our results give credence to the widely known anecdote that programmers copy and paste code from vulnerable tutorials,” the researchers from TU Berlin, Saarland University and TU Braunschweig in Germany wrote. “Our case study…indicates that such ad hoc code re-use may endanger the security of software throughout the open- source landscape.”
To assess the impact of tutorial and sample code on actual applications, the researchers developed tools for matching up security flaws in sample code against the content of 64,415 PHP projects hosted on GitHub. After finding likely matches, the researchers manually reviewed the GitHub repositories to verify that the matching code actually constituted a vulnerability in the open source code base.
In all, the research uncovered 117 vulnerabilities that have “a strong syntactic similarity to vulnerable code snippets present in popular tutorials,” the researchers wrote. In one case, a snippet of code from a single tutorial accounted for eight SQL injection vulnerabilities in different web applications, the researchers found.
“Our results indicate that there is a substantial, if not causal, link between insecure tutorials and web application vulnerabilities,” the researchers concluded.
The connection between code tutorials and production vulnerabilities could be a gold mine for hackers, who can analyze a small but influential code tutorial for security vulnerabilities, and then go looking for implementations of that vulnerability in deployed applications. Such “bootstrapping” of a large-scale vulnerability discovery effort could yield far better results for would-be attackers than analyzing distinct repositories looking for novel vulnerabilities.
“Our findings testify to the feasibility of large-scale vulnerability discovery using poorly written tutorials as a starting point,” the researchers wrote.
The fixes for the code re-use problem are not simple, nor are they easy. The researchers call for a code audit of widely consumed tutorials “perhaps with as much rigor as for production code.” Tools developed by the researchers to conduct their survey can also be used to find similarities between code snippets from different sources, helping organizations doing application development to uncover borrowed code lurking in their applications.