Non-Human Traffic Dominates the Web

Incapsula-logo-widgetIncapsula did their third annual survey of web traffic to determine how much is human generated versus machine generated. From August through September, 2014, they surveyed over 15 billion visits to over 20,000 web sites scattered around the world.

What they found will probably surprise the average person (but not any web administrator). For the third year in a row there was more traffic generated on the web by bots than was generated by people. There are both good and bad bots and they looked at each transaction to determine the nature of the bot. In 2014, 44% of all web traffic was generated by humans, 29% from bad bots and 27% by good bots.

So what are bots exactly? There are many examples of good bots. Probably the best known is the Google web crawler that reads through web sites to build the Google search engine. All search engines have similar bots, but Incapsula says that the Google bot is unique in that it seems to crawl through everything – big web sites, small web sites and even dead web sites, and this certainly accounts for why you can find things on Google search that don’t turn up anywhere else.

Another example of a good bot can be seen when you go to a shopping site. If you’ve ever shopped for electronics you will find a bunch of these sites. They list all of the places on the web that are selling a given component and let you compare prices. These sites are built by bots that crawl through the electronics sellers to constantly grab any updates. These sites do this to earn sales commissions when people choose to buy something through their site.

Another big category of good bots are RSS feeds. This stands for Really Simple Syndication. I used this technology for years. It was a way to know if somebody wrote a new blog or if a news site published an article on a topic of interest to you. The RSS bot would notify you when they found something you were looking for. There was a 10% drop from 2013 to 2014 in good bot traffic due to the phase-out of RSS feeds. Google Reader was the biggest source of such feeds and it was discontinued last year.

What is scary is the ever-growing volume of bad bots. These are just what you would imagine, and are crawling around the web trying to do damage.

The fastest growing class of bad bots are impersonator bots, which are malware that tries to look like something else to make it onto a web site or computer. These include DDoS (denial of service) bots that are disguised to look browser requests, bots that are disguised as proxy server requests, and bots that mimic search engine crawls. These are really nasty pieces of malware on the net that are used for things like data theft, site hijacking, and denial of service attacks. These bots go after all types of web sites hoping to then infect site visitors.

Probably the biggest volume of bad bot traffic comes from scrapers. These are bots that are designed to grab certain kinds of information. The good bot listed above that compares electronics prices is a kind of web scraper. But the malicious web scrapers look to steal things of value such as passwords, email addresses, phone numbers, credit card numbers, or other kinds of data that can then help hackers better attack somebody.

Of course we all know about the next category of spamware which is used for all sorts of malicious purposes like content theft, phishing, and identity theft.

The final category of bad bots are categorized as hacking tools; these are generally aimed at servers rather than computers. Hacking tools are used to crack into servers to steal corporate data, to steal credit card information, or to crash the server.

Incapsula found that bad bots attack web sites of all kinds and that there are proportionately more bad bots trying to crack small web sites than large ones. This is probably due to the fact that the vast majority of web sites have less than 1,000 visitors per day and are often much less protected than larger corporate sites.

What does this all mean for an ISP? The ISP uses tools to try to intercept or deflect as much of the bad bot traffic as possible. ISPs try to keep malware off customers’ computers since one of the biggest threats to their network are attacks from within. Accumulated malware on customers’ computers can play havoc within the network and inside firewalls.

There are companies like Incapsula that sell tools for ISPs to monitor and block bad bot traffic. But the volume of bot attacks is so large these days that it’s often a losing game. For example, Incapsula says that during a denial of service attack, when large volumes of bots attack the same site simultaneously, as many as 30% of the malware attached to the attacking bots gets through any normal malware protection schemes.

To some degree the bad guys are winning, and if they get far enough ahead it could be a threat to the web. The worst of the bad bots are written by a handful of very talented hackers and the industry is currently stepping up pursuit of these hackers as a strategy to cut off bot attacks at their sources.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s