Facebook Claims Site Scraper Responsible for Leak of 1 Million User Emails

Yesterday, we reported that a blogger claimed to have purchased a list of over one million Facebook user emails, complete with full names and the URLs of their Facebook profiles.

Facebook has contacted Search Engine Watch to offer a new statement, as part of their ongoing investigation:

Facebook is vigilant about protecting our users from those who would try to expose any form of user information. In this case, it appears someone has attempted to scrape information from our site. We have dedicated security engineers and teams that look into and take aggressive action on reports just like these. We continue to investigate this specific individual.

The problem, as we spoke with Facebook about yesterday and again today, is that we also obtained the file in question and it seems highly unlikely the leak was the work of an individual scraping the Facebook website for publicly available email addresses. Why? Well, in the hundreds of random profiles on the list we’ve checked so far, not one has their email address publicly available.

The person selling the list claimed to be an apps developer, who used their Facebook and Twitter apps to collect user information over a period of six months.

The seller’s account has since been removed from Gigbucks and the list offer deleted. Josef at Gigbucks confirmed this was the result of the user violating their terms of service and that Gigbucks was not asked by Facebook to remove the listing.

The original offer read, in part: “The information in this list has been collected through our Facebook apps and consists only of active Facebook users, mostly from the US, UK, Canada and Europe.”

As we mentioned yesterday, users may have changed their privacy settings between the time the information was harvested and when the list was compiled, on October 13th, 2012. However, a logical person would expect at least a few of the affected profiles to show a publicly available email address, especially given that there has been no confirmation from Facebook that affected users were notified of the breach.

It is also a possibility that the list was the result of a previous privacy breach. Search Engine Watch had an independent consultant evaluate a number of affected email addresses. Some appear to have been created after the last major privacy breach, indicating this was not an old list.

No Facebook passwords were leaked with the user emails. So what could a person do with a list of names, email addresses and Facebook profile URLs?

  • Send more convincing phishing emails, using the Facebook user’s profile URL, first and last name, and login email, to obtain more information.
  • Use a password cracker to discover the user’s Facebook password to accompany the login email, allowing them to hack the account and discover more information.
  • Use the name and login email together to reconstruct the user’s actual identity.

facebook-facial-recognitionThat last scenario is already happening. Search Engine Watch did find a website we will not name or link to, in order to avoid exposing these people further, where people are reconstructing the identities of the people on the list. This, of course, can be used for identity theft or to discover other accounts that may be easier to hack. The entire list has been “dumped” in a publicly accessible location and is now indexed and searchable.

We did make Facebook aware of the wider availability of the list and they said they are continuing their investigation.

The elephant in the room, however, is the fact that many users do not understand that when they deal with Facebook apps, they are not protected by Facebook’s terms of service (TOS). Each app has its own TOS the user must accept. Facebook does not vet, test or otherwise police their apps developers and are not responsible for their actions.

Users can protect themselves by reviewing the terms of each app carefully before approving, and removing apps no longer in use. We will continue investigating the source of the list and whether the affected users have been notified of the leak of their personal data.

Related reading

Search engine results: The ten year evolution
Five ways PPC customer support can help SMBs
#GoogleDoBetter The latest on internal issues at Google and Alphabet
Google Sandbox Is it still affecting new sites in 2019