Say what you want about social media. The bare fact is that folks use it – more of them every day. In fact, social media sites like Facebook, Twitter and YouTube are growing – quickly – and have come to define our modern online experience.
That said: the sites represent a huge security risk. Sites like Facebook, Twitter and Instagram are increasingly used as platforms to circulate scams and malicious links. A larger and more nebulous threat is posed by all the information that organizations and their workers are spilling online.
It’s already common knowledge that hackers and other “bad guys” comb through worker profiles or LinkedIn, Facebook and other sites to help craft targeted attacks. But could your social networking profile provide more useful information – like your password? Independent security researcher Itzik Kotler thinks so.
Kotler is the creator of Pythonect, a new, experimental dataflow programming language based on Python. Using it, he said he’s been able to derive passwords from the public content of individuals’ LinkedIn profiles – combining information like the company an individual works for, their name and birthdate to derive actual passwords for their account.
Kotler’s method was straight forward: he used Google’s Custom Search Engine to find all the employees for a given company. For the profiles that are returned, Kotler then scraped their personal information for analysis- a job made easier by LinkedIn’s adoption of the Google hCard microformat, which is used to display the contact details of people, companies, organizations, and places in easy-to-read form on search results pages.
The strategy – really a proof of concept test for Pythonect – isn’t the most efficient means of breaking into an account, Kotler admits, but it does suggest that the treasure troves of personal data we make available online could be useful as more than just fodder for social engineering attacks.
Kotler responded to some e-mail questions from Security Ledger about Pythonect and distilling passwords from LinkedIn data. (Note: I’ve edited this to correct typos and spelling errors and for coherence.)
Security Ledger: From your blog post it seems like Pythonect is the key ingredient here. Explain what Pythonect is and what does it allow you to do that you couldn’t do before.
Itzik Kotler: Pythonect is a new, experimental, general-purpose high-level dataflow programming language based on Python, written in Python. It aims to combine the intuitive feel of shell scripting (and all of its perks like implicit parallelism) with the flexibility and agility of Python.
Pythonect, being a dataflow programming language, treats data as something that originates from a source, flows through a number of processing components, and arrives at some final destination. As such, it is most suitable for creating applications that are themselves focused on the “flow” of data. An application that generates passwords from employees’ public LinkedIn profiles have a coherence and clear dataflow:
- Find all the employees’ public LinkedIn profiles
- Scrape all the employees’ public LinkedIn profiles
- Crunch all the data into potential passwords
That’s why Pythonect is the key ingredient here.
Security Ledger: You use the example of four character passwords generated from a LinkedIn member’s name. But what other public, open source information in LinkedIn profiles might be used in passwords, in your experience?
Itzik Kotler: Company name and Birthday date.
Security Ledger: Wouldn’t you need lots of artificial intelligence to actually combine the data from the LinkedIn profile into potential passwords – or are you merely saying: ‘here’s a chunk of alphanumeric data, give me all potential x character passwords that you can create from it’?
Itzik Kotler: Saying ‘here’s a group of alphanumeric data, give me all potential x character passwords that you can create from it’ is the 100% of all the possible passwords derived from that alphanumeric data. Anything else is a subset of this list.
Security Ledger: How does LinkedIn and Google’s use of hCard make this hack easier?
Itzik Kotler: Everything is easier when it’s organized. LinkedIn exports the users public profiles in hCard micro format, and Google adds hCard to search-result pages. As a result, it’s very simple to develop a bot that will search for a given company or people’s common ground, go through the search-result pages, and scrape the exported hCards.
Security Ledger: Is this practical (i.e. do you have any evidence that it can scale and work)? Is it more efficient than, say, running through the lists of hundreds of millions of known passwords that have already been leaked?
Itzik Kotler: Yes, it is practical. I have successfully tested it on a small group of friends, but then again, passwords are very individual things. As for how efficient it is, that’s a question for statisticians, but if I had a list of the top 100 million known passwords – I’d give it a go first.
Security Ledger: How powerful a system would you need to compile all the open source LinkedIn data and crunch it into a list of potential passwords?
Itzik Kotler: I’d say pretty powerful, the challenge for the system will be: processing power for the data manipulation, and storage for saving it all afterwards. Having said that, there is no limit on the number of data manipulation algorithm’s that can run in parallel, so a multi-core system can definitely help here.
Security Ledger: Do any LinkedIn security features make it impossible to do this kind of crawling?
Itzik Kotler: For this specific crawling method (i.e. Google Custom Search + hCard), it’s enough to disable the public profile feature to make one go off the radar.
Security Ledger: Are there other, better sources of open source intelligence for password crunching (i.e. Facebook?)
Itzik Kotler: Good question, I’d say Facebook, Twitter, and even Flickr can be a good open source intelligence, but they are too “unstructured” for a simple crawling. For instance, a picture of a person and his dog on Flickr, with his dog name in the description, can be a good lead for that person’s password. Having said that, it may not be so simple to extract his dog name from the picture description.