New Search Engine Wants To Be A Google For Code

Researchers at The University of Cambridge in the UK have created a Google-like search engine that can peer inside applications, analyzing their underlying code.

The search tool, named “Rendezvous,” has applications for a number of problems. It could be used to help reverse engineer potentially malicious files, copyright enforcement or to find evidence of plagiarism within applications, according to a blog post by Ross Anderson, a Professor of Security Engineering at the Laboratory.

Rendezvous Search Engine
The Rendezvous search engine can search for reused application code.

 

Rendezvous was unveiled in a seminar on Tuesday by Wei Ming Khoo, a doctoral student in the Security Group working at the University of Cambridge’s Computer Laboratory. The engine, which can be accessed here, allows users to submit an unknown binary, which is decompiled, parsed and compared against a library of code harvested from open source projects across the Internet.

Code reuse has become a pressing security issue. The application security firm Veracode has named reused and third party components as a leading source of code vulnerabilities. And OWASP added the security of third party components to its 2013 list of the Top 10 Project, which identifies the leading security issues found in software applications.

Wei Ming Khoo

According to a paper that describes the search engine, Rendezvous was created as a proof of concept to “bootstrap” efforts to develop tools for identifying code reuse within applications. Use of open source software and third party code is now part and parcel of application development. But its unchecked use poses serious security risks, from the insertion of malicious components to cut-and-paste re-use that introduces security vulnerabilities. Beyond that, companies that license software have no way to know when it is being used improperly or in violation of their license agreement.

Rendezvous is intended as a tool for auditors who want to identify reused components in a software product. Wei said the search engine can be a supplement to manual audits of decompiled source code, especially when code is not available to audit. Rendezvous “reframes identifying code reuse as an indexing and search problem” using open source repositories such as GNU, the Apache foundation, Linux and BSD distributions as well as public code repositories such as Github and Google code to provide a base population of code to search against.

Wei describes his search engine as using a statistical model “comprising instruction mnemonics, control flow sub-graphs and data constants which are simple to extract from a disassembly, yet normalising (sp) with respect to different compilers and optimisations.” Experiments using the engine show that it was accurate in identifying components of the GNU C library and the GNU coreutils suite compiled using two different compilers – a sample comprising more than one million lines of code.

Writing on Tuesday, Anderson said Rendezvous has many potential applications: from locating vulnerabilities across a code base to identifying malicious code. Wei said the engine will bring “significant changes to the way patch management and copyright enforcement is currently performed.”

Comments are closed.