I happen to despise buzzwords, so it has been challenging for me to use the term "big data security analytics" in a sentence, mostly because I find it to be a technical description of the solutions in this space, rather than an indicator of the value they provide. However, since we build products based on the security problems we identify, I want to explain how those technologies can be used to target some highly pervasive incident response challenges.

Detection and investigation problems continue to evolve as attackers do

I recently wrote about the creativity of cyber attackers, but I did not discuss the effect this has had on incident response. It is very difficult to compare an incident responder's job to any other because of its complexity. You are trying to stop other humans from reaching their goal like a Chess Grand Master (in this case, unrestricted access to your network), except there are not established rules to the game and new types of chess pieces are continually added (in the form of productivity technology). It was originally possible for a single security professional to manage an organization's IDS, firewall, and antivirus solution. Because of the adaptive nature of the malware developers, social engineers, and other malicious parties, this is no longer the case. To adjust, security teams have either expanded in size or piped all data valuable for detection to their SIEM. A decade ago, you had to have a nearly unlimited budget and a dedicated software development team to accomplish this. Now, user behavior analytics technologies are making it possible for you to automate and focus your development efforts elsewhere.

Centralizing your data was a massive problem to overcome

SIEM technology changed incident response because it suddenly gave the team a single place where they could access (seemingly) all of the organization's events. The first challenge, though, was convincing the network device vendors (and Microsoft) to export all events from their systems. After years of hearing these demands, most vendors caved and provided a means to export via syslog as standard on all systems. You just needed to navigate their lovely user interface and decipher an online help manual to do so. This end solution was supposed to give attackers nowhere to hide.

However, as we learned in developing InsightIDR, not all systems have reached a point where they export logs (Microsoft DHCP and DNS, for example), so SIEM customers had to get creative in their data collection by building custom ways to access log files and update it every time the vendor released a new log format. To make things worse, it turns out that there are a great deal of events on your users' laptops and desktops that never get forwarded to the SIEM [even if alerts from your EDR solution are being forwarded]. There are a select few organizations that built a custom solution in which the events are forwarded from the endpoints, themselves, but this is held together by bubble gum and duct tape, so you need at least one dedicated engineer to make sure the data is consistently flowing. We recognized that attackers were using this widespread blind spot to their advantage, so we automated the process without the need for software agents. Eliminating every hiding place on your network is one of our main goals with InsightIDR.

Doing something with the aggregated data was the next challenge

Even if it wasn't all of the data in your organization, it was enough for teams to start using it for detection and incident analysis, so the solutions that were originally built with a focus on centralization had to begin adding rules engines and search capabilities. Once again, this was a massive improvement over the previous process of manually logging into individual systems and reading through the logs when things didn't feel right. It just didn't scale unless you were able to employ multiple security experts and data scientists. Why? Once you started adding various data sources, you needed someone that understood it from both of these angles.

The problem that gradually emerged was one of the human mind: as with major catastrophes, such terrorist attacks or nuclear meltdowns, we often fall into a trap called the "narrative fallacy". This trap is our need to weave an explanation into a sequence of facts, even when the available data is insufficient to accurately do so. It causes us to look at partial datasets from a legitimate incident and see a story. This story was often used to create rules for alerting because there was confidence that it would alert the team as soon as the same incident occurred again, but they were not tested for noise in the normal flow of business. Seen within the silo of only their organization and a single incident investigation, too many alerts where perceived explanations were found in incomplete data that also matched a great deal of common activity on these networks. Once the number of rules grew over time, organizations started to get alert fatigue. When the datasets grow and it is unclear if you have the proper context around an incident, you need more than just a security mindset to find the relevant signal in the noise.

Security Analytics solutions can help organizations without the budget for this expertise

From our experience with security teams, it is a disturbingly small number of them that can afford to employ incident responders, security experts, and data scientists while regularly maintaining their data feeds internally or through professional services. This means that you most likely fall into the enormous group of organizations that are time-challenged and cannot easily avoid falling into two more pervasive biases of the human mind not drastically different from the narrative fallacy: illusory correlation and clustering illusion.

  • Illusory correlation - this phenomenon is very difficult to overcome; the human brain often sees relationships between events across datasets, even when none exist, because we need more context to understand.
  • Clustering illusions - another type of mistake our heads make; we do not deal well with random events, so it is very common for a manual review of large datasets to yield patterns, even when none exist, because we feel certain something has occurred.

Of course these challenges can be overcome, especially if you have a staff of data scientists that trust the statistical analysis first, then fact-check it manually. It is just severely expensive to do so without a user behavior analytics solution with the machine learning and security expertise built into it. Your SIEM or log aggregation solution is the perfect initial data source for the relevant data, but as we just don't want to leave any place to hide, InsightIDR is designed to complement or completely replace these with the flexibility to analyze new types of data as it is determined to aid detection. We will always build ways to augment the initial sources with data collection to cover your evolving infrastructure that now includes endpoints, mobile devices, and various cloud services.

To learn more about InsightIDR and Rapid7's other solutions for detecting compromised credentials, check out our compromised credentials resource page and make sure to download our complimentary information toolkit. I think you'll see the problems we address that your existing solutions cannot.