Scraping airlines bots: insights obtained studying honeypot data

Chiapponi, Elisa; Dacier, Marc; Catakoglu, Onur; Thonnard, Olivier; Todisco, Massimiliano

Airline websites are the victims of unauthorised online travel agencies and aggregators that use armies of bots to scrape prices and flight information. These so-called Advanced Persistent Bots (APBs) are highly sophisticated. On top of the valuable information taken away, these huge quantities of requests consume a very substantial amount of resources on the airlines’ websites. In this work, we propose a deceptive approach to counter scraping bots. We present a platform capable of mimicking airlines’ sites changing prices at will. We provide results on the case studies we performed with it. We have lured bots for almost 2 months, fed them with indistinguishable inaccurate information. Studying the collected requests, we have found behavioural patterns that could be used as complementary bot detection. Moreover, based on the gathered empirical pieces of evidence, we propose a method to investigate the claim commonly made that proxy services used by web scraping bots have millions of residential IPs at their disposal. Our mathematical models indicate that the amount of IPs is likely 2 to 3 orders of magnitude smaller than the one claimed. This finding suggests that an IP reputation-based blocking strategy could be effective, contrary to what operators of these websites think today.

Digital Security
Eurecom Ref:
Copyright CORESA. Personal use of this material is permitted. The definitive version of this paper was published in and is available at :