Deploying robot scripts to scrape content is nearly as old as the Internet itself – as the Web shifted from human-curated lists (think the original Yahoo), the infrastructure became reliant on automated scripts that robotically moved through the Internet to archive, map and organize the blossoming trove.

Robots continue to trawl the Internet more than ever, some with noble purposes and others malicious. The malicious intent of some of the robots involve scraping pricing and product information from competitors in order to make instantaneous adjustments. This is especially problematic in travel, as accurate data is valuable in a price-based commoditized business that relies on data agreements to wall off competitors.

Distil Networks, which offers an anti-scraper bot service, has logged some of these travel-specific challenges in its latest whitepaper.

Upstarts seeking competitive advantage

One of the primary challenges to incumbent travel websites is protecting data from upstart travel companies seeking to build a competitive advantage by scraping larger companies’ websites.

By avoiding expensive data sharing agreements, these startups are able to build an accurate, robust, real-time dataset without any of the costs normally associated with direct/contractual access to this data.

The report specifically calls out a few of these startups:

Yet, instead of market consolidation chasing away additional scrapers, a new wave of startup companies and existing competitors are increasing travel website scraping volumes. They are adding unique views to scraped data and further disrupting the online travel industry. Three recent examples include Backbid, Tingo, and Backbid and Tingo offer new twists on obtaining low-priced hotel reservations, while provides meta-search services similar to those of Kayak and Trivago, but with an Asia Pacific focus.

Larger sites are clearly targets for disruption of all kinds – and automated data scraping is a simply and efficient way for a less-resourced startup to glean information to be competitive on the global comp-set that is travel. Some sites that were once small relied extensively on this technology to scale up to the larger players they are today.

In fact, many online travel competitors have touted their capabilities for smart website scraping, and business models, like those of Kayak and Hipmunk, have been built on successful web scraping operations. The more these businesses can scrape data, the more of the travel experience they can provide to customers. Eventually, this allows the scrapers to cross-sell and up-sell customers with higher margin add-ons, such as car rentals and entertainment, as well as charge additional fees that originally appeared on the scraped website.

The report also highlights that these sort of scraper bots are incredibly inexpensive given the newly emerged cloud infrastructure:

Launching scraper bots from a virtualized environment allows a person or organization to spin up scraper resources, spin them down, change IP address and then spin up resources again to scrape some more

This means that the barrier to entry to aggregate travel products is quite low, as technology has made it possible for small teams to provide vast inventory to online searchers without the commensurate cost or clout that it previously required.

Meta is built on this

Meta started exclusively as a scraper business, as the niche initially started as a way to search the same trip on multiple sites in one place.

As the industry proved itself, meta became massive and the industry shifted, with the likes of TripAdvisor joining the fray. And as consumers became accustomed to searching the entire travel inventory in one place, travel sites were under increased pressure to deliver a comprehensive overview of a giant data set of travel inventory. Smaller players were therefore pushed more into meta, which meant more scraper bots deployed then ever.

So today, the landscape is one of a scrubbed relationship with the consumer.

Multiple sites may be competing for the same consumer, but the consumer is pushed from one to another to complete a booking. This means that the actual consumer relationship is muddied, and as the report points out this is an enormous pain point being successful leveraged by sites as they strive to bring back business lost to the OTAs and meta search sites.

Here are the main points the report suggests are important to note, with a company-reported 30% of website traffic making up bots:

They no longer own the customer experience. This means no way for itinerary sharing and updating. The add-on fees from the meta-search or OTA performing the scraping makes it appear as if they are the victimized site’s fees. This diminishes the customer’s view of the actually victimized organization. Revenue is lost because the victimized site cannot cross-sell or up-sell to customers who do not actually visit the website. They are unable to generate customer profiles and sell to customers’ expressed needs. Consumers show less brand loyalty to the victimized site, because they either have no relationship with the site or view it negatively because of actions performed by the offending scraper.

All of these points add up to a push from some to minimize, mitigate and/or manage the flow of scraper bots on individual websites.

Strategies to overcome the bots

There are several ways in which travel companies can approach the incoming flow of bots: block all bots, allow some bots depending on business priorities, allow bots but track them, selectively allow some bot access and finally offer misleading information to bots.

Each of these reactions to bot traffic must be matched to business goals and priorities, and can result in some very real savings, as seen in the example below:

Let’s consider, for example, a US-focused travel website (with no meta-search business) selling $400 million in yearly bookings and offering standard prices and packages, with few differentiators from other online travel websites. At a 10% margin, this yields $40 million in profit.

Next let’s assume 30% of this website traffic comes from scraper bots. Of that scraper traffic, 10% targets data about the OTA’s highest margin revenue stream, which represents $100 million in revenue at a 25% margin, or $25 million in profit. This revenue stream, while just 25% of the company’s sales, represents 62.5% of its net income. Therefore, defending this revenue stream is critical to remain competitive and fund future growth.

The full report goes into more detail about how technology can be deployed to separate bots from legitimate customers and can be downloaded here.

NB: Malicious robot image courtesy Shutterstock.

Original author: Nick Vivion