Intro to Internet Bots
It's expected that websites are accessed through web browsers by people sitting behind monitors. The reality, however, is a little different. The majority of the traffic that modern web pages see is generated without direct human interaction. Let's talk about different ways a website can be accessed.
The estimates show that since 2015 the most prevalent website visitor is a web spider. A web spider is a special program (bot) that has access to the Internet and can visit websites programmatically. Sometimes it tries to identify itself by providing a special value in the User-Agent header, and sometimes it tries really hard to look like a real person by spoofing the headers and mimicking human behavior.
There are tens of thousands of different web spiders and classifying them is a tedious task. On the surface they all can be put into two distinct categories: benign and malicious.
Benign spiders are the ones that a website owner wants on their website. The majority of these bots are well known and they usually don't hide themselves. Some even provide links to the additional information in the User-Agent. Here’s a list of most common ones.
Search Engine Crawlers. These bots are run by search engines to index the content of your website, so people are able to find it. Examples: Googlebot, Bingbot, DuckDuckGoBot, YandexBot.
Paid Advertisement Checkers. In case you buy keywords to drive traffic to your website these bots will go there to evaluate the relevance of the ad and the quality of the landing page.
URL Checkers. When a link to your website is shared on social media or sent to someone in a private message it will often be visited by a bot to make sure it's not a threat to the user. Facebook, Slack, Google and others are running these bots to protect their users.
Uptime Monitoring Bots. Examples: Updown, Statuspal, UptimeRobot. These bots are making sure the website is responding within a specified latency and will scream if that's not the case. They are typically configured by the website owners, so their visits should not be a surprise.
Website Health Monitoring Bots. These would include SSL certificate checkers, API functionality testers, vulnerability scanners. Just like uptime monitoring bots they are specifically configured to visit the website.
Statisticians. There are a few bots that collect statistics about the Internet. For example WayBackMachine stores historical snapshots of the pages, Common Crawl makes the web data publicly accessible for studying and data-mining.
Security Scanners. These are vulnerability scanners and monitoring bots that crawl the entire Internet for potential threats. Some of these bots aim to protect intellectual property and report potential copyright infringement.
Malicious spiders are typically created by cyber-criminals to gain access, exfiltrate data or find vulnerabilities in websites. They often pretend to look like real human beings or even benign bots.
Spambots. Spam bots attack contact forms to send unsolicited messages. They can be sent to the website owners or other people. If the website allows everyone to publish their posts some bots can exploit that to spread the unwanted content.
Fake account bots. The purpose of these bots is to create fake accounts. They jump right to the sign-up forms and start creating accounts automatically so the attacker can later use them for various purposes.
Account takeover bots. These bots try to break into existing accounts by entering credentials on a log-in form. Not-so-smart ones are brute-forcing the log-in form by trying all possible combinations. Smarter ones are using a database of exposed credentials. The database can consist of just common passwords or known credential pairs.
Phishing man-in-the-middle kits. Some phishing kits are not just copying the website an attacker wants to impersonate. They act as a middleman between the victim and the real website. This allows them to steal not just username and passwords, but also Multi-Factor Authentication codes and session cookies.
Backdoor scanners. Most of the websites have backdoors or admin interfaces. These bots are scanning the Internet to find those to later privileged access. Gaining the access to the backdoors can lead to a whole slew of attacks ranging from publish the content of a phishing page or just exfiltrating sensitive data.
Votebots. Votebots vote or like posts on online platforms to skew the results and get the outcome that is favorable to an attacker.
Back-end exploits. Since the back-ends that run the websites can be vulnerable to pure programmatic attacks such as SQL injections some bots are taking advantage of that. They are trying to put special combinations of bytes in the request payload to trigger some vulnerability. Some of them are not even targeted and try to exploit every possible known vulnerability on any website they interact with.
Some bots are created to exploit business logic vulnerabilities we discussed in one of the posts.
Examples of this would be bots that place items in the shopping carts on e-commerce platforms, inflate engagement metrics, submit fake product reviews, and so on.
These spiders can be either benign or malicious depending on the use of data that is being collected.
Scrappers. If the website surfaces valuable data and makes it publicly available that could be an interesting target for a scrapper. Scrappers collect the information from the website and send it back to be processed. The data can be used to be structured and be presented in a different format or even used maliciously to gain a competitive advantage or be sold.
Competitor monitors. Services that help people monitor their competitors and notify them about pricing changes or new product releases.
Marketing monitors. Bots doing marketing research and collecting keywords from the websites or monitoring mentions of a given brand.
The list above is not at all comprehensive and every day there are new bots coming into existence, but we hope it shows a glimpse of how huge the "zoo" really is. Managing your visitors is a challenging but interesting task and we hope our further posts will help you with that.