Multiple data-scraping groups have abused the Facebook link preview feature to scrape data from internet sites disguised as Facebook’s content crawler.
The technique consisted of using Facebook developer accounts to place calls to Facebook or Facebook Messenger API servers, requesting a link preview for pages a group wanted to scrape.
Facebook would fetch the data, assemble it in a link preview, and return it to the data scrappers as an API response, ready to be ingested into the scrapper’s database.
The technique was successful because most website operators allow Facebook servers to crawl their sites, knowing the data Facebook collects from their pages is usually used for legitimate purposes, as part of link previews on the social network, Facebook Messenger, WhatsApp, or Instagram.
Multiple groups abused the technique
But in a report published last week by DataDome, a security firm that provides bot detection capabilities for online sites, the company said it discovered several “scraper operators” employing the technique to (ab)use Facebook as a proxy for their data-scraping activities.
DataDome said it identified multiple groups abusing the technique on multiple sites, but the initial detection came on the network of one of its customers, a classified ads portal.
“Our heuristic analysis uncovered that certain parameters, unlikely to be used by humans, were overrepresented in the URLs that Facebook requested,” DataDome explained.
This included URLs for pages on the classified site that users wouldn’t normally share on Facebook on a frequent basis, such as search results pages — a dead giveaway that someone was scraping the classified ads site for recent entries.
Tests carried out by the DataDome team confirmed the technique’s efficiency and discovered that data-scraping groups could abuse this feature to retrieve link previews for up to 10,000 URLs/h from one single Facebook developer account.
The French security firm said it notified Facebook of the attacks earlier this year.
“Facebook has now improved rate limiting on the Messenger preview API. As our tests (and certain hacker forum discussions) confirm, this effectively prevents continued abuse of the preview feature for scraping purposes,” the security firm said.
A Facebook spokesperson confirmed the scraping operations and the API fix, but the company did not have anything to add on top of DataDome’s report.