The average person’s conception of hacking is very technology-focused. The stereotypical hacker sitting in a basement surrounded by screens full of terminal commands has embedded itself in our culture. However, this view of hacking isn’t entirely accurate.
The vast majority of cyberattacks are focused on a much easier target than a computer: a human. It’s relatively easy to secure a computer system; once a vulnerability has been discovered, correctly applying a patch for it makes it secure forever. As a result, identifying and exploiting a new vulnerability is difficult for cybercriminals. Humans, on the other hand, are much harder to “patch”. Many organizations provide cybersecurity awareness training to their employees; however, these same employees are perfectly willing to click on a link or open an attachment in an email if it seems like doing so would benefit them.
This practice of tricking people into doing what the cybercriminals want is called social engineering. People are often predictable, and, given access to an employee and enough data about them, an attacker can craft a pretext that increases the probability that the target will fall for the scam. However, the information that is most valuable to an attacker may not be what one would expect. A great deal of valuable information is publicly posted on an organization’s website, making it possible for automated bots to use web scraping to build a profile for future attacks.
Most Organizations’ Websites Leak Sensitive Data
Every organization’s website walks the line between sharing enough information with their customers and too much. Businesses need the ability to connect with their customers and drive sales, but too much data risks attacks or loss of intellectual property. However, seemingly innocuous data on the organization’s web page can be extremely valuable to a cybercriminal.
- Organizational Charts
In any company, there is a hierarchy. An employee that is higher on the corporate ladder has an ability to request or order that things be done that lower-level employees don’t share. As a result, a social engineer being able to say “the CEO asked me to….” or “This is Gary from IT, we’re currently working on a project to…” can be invaluable in an attack.
Accomplishing this requires a decent understanding of the relative rank of people within the organization. Learning this is surprisingly easy since many organizations will post organizational charts on their webpages or refer to high-level executives by name on their websites. By scraping this data from a site, a social engineer has the data needed to perform a more targeted spear phishing or whaling attack.
- Email Addresses
Knowledge of email addresses within a company is valuable to an attacker for a number of different reasons. First, it enables an attacker to actually send a phishing email to a particular user. Secondly, knowledge of an organization’s internal email scheme can be useful for crafting lookalike email addresses for phishing attacks. A third reason is that, with knowledge of someone’s email address, it is possible to collect a great deal of information about them on the Internet using social media and other sites, providing an attacker with background information for phishing attacks.
Many organizations try to hide the email addresses of employees in their companies by making their only public-facing addresses generic ones, like firstname.lastname@example.org or email@example.com. However, most organizations have a standardized email scheme, so even a single email address posted on the site and a list of names gives a social engineering a long list of phishing targets to try. With 91% of cyberattacks beginning as a spear phishing email, this list of targets represents a significant vulnerability in an organization’s cyber defenses.
- Job Titles/Descriptions
Job titles and job descriptions are a valuable source of information for hackers. Knowing that an organization is trying to replace their Java Developer or MySQL Database Engineer tells an attacker a great deal about the organization’s internal systems. However, this information is often openly posted on organizations’ web pages and on job sites, making it easily accessible to attackers.
Protecting Against Malicious Web Scraping
Collection of sensitive data can be performed manually; however, most malicious web scraping is automated. Whether the goal is collection of data for social engineering, theft of content, or other malicious purposes, it’s more efficient to perform collection at scale.
This requires the attacker to use bots to perform their web scraping. While bots have been growing in sophistication, many of them are detectable by looking at a few different features:
- HTTP Headers: All HTTP traffic contains headers set by the client application. Those set by a browser often look different from the ones used by a bot.
- IP Addresses: Traffic associated with known-malicious domains is more likely to be malicious.
- Behavioral Analysis: Human users of a site and malicious bots often act very differently, making it possible to differentiate them.
Some security solutions include built-in bot protection, which can help to identify and block attempts by bots to extract sensitive data from an organization’s website.
Plugging Website Data Leaks
Many organizations are leaking a great deal of potentially sensitive information on their websites without even knowing it. However, a lot of this information also needs to be on the website for legitimate business reasons. Deploying a security solution capable of detecting and blocking web scraping efforts by bots can raise the bar for attackers trying to collect and exploit data posted on the organization’s website.