Recently, I wrote about the two largest incident response bottlenecks behind the massive gap in time to compromise an organization and time it takes incident response teams to verify the true incident and take appropriate action. I then discussed the second bottleneck of incident analysis, and to close the loop, I want to discuss the first bottleneck: alert triage.

More data often means more alerts, but it shouldn't

In many monitoring solutions, a series of rules are defined, and any relevant new data is compared in a rules engine to see if an alert needs to trigger. Whether these rules are internally managed or periodically updated by professional services, the list of rules grows significantly over time. Having hundreds of thousands of rules means that your organization likely receives thousands of alerts per day, thus justifying the need for a team to triage the 99% of them that are false positives. Many organizations even resist spending additional money on monitoring to ingest data from more sources because the investment would be wasted, as it would cause more information overload while not increasing the number of alerts the team can manage. This "cost for no benefit" decision is why most businesses never even consider expanding their monitoring implementations after the initial three to six month deployment project.

I have written previously about alert fatigue and UserInsight's approach to never causing it. What I failed to mention is that we are continually adding support for new types of data, whether our solution collects it or receives it from another system. The solution is designed such that any data can be added, and since we immediately translate it to the human "behavior" it describes, we will stay true to our motto - "don't be noisy". This way, when more data is added to UserInsight, it frequently brings more context to detection, so that some of our alerts can be enhanced and more intelligent. Oh, and best of all? We won't charge you for sending more data. The solution is priced by user because we think that your best chance at detecting intruders is to monitor as much normal behavior on your network as possible. Don't give them anywhere to hide.

Context is mandatory for noise reduction

Incident responders that are triaging alerts all day frequently find themselves asking two big questions:

  • "Why was this rule defined?"
  • "What was someone doing to trigger this?"

Since there is no realistic way for a person to fully understand thousands of alerts in a day, it is easy for a true incident to look just like the five hundred false positives that triggered the same rule. If a sufficient amount of time is dedicated to understanding every triggered alert, the probable incidents that are passed to incident analysts are not done so in a day or even a week. This means that you need to make one of two sacrifices in the triage process: either (a) miss incidents by taking shortcuts and ignoring all alerts of a certain type or (b) get comfortable with analyzing an incident weeks after the initial event occurred.

Understanding an alert during triage does not mean simply having access to the source data. It means the behavior and concern behind the alert should be readily apparent as soon as you view the alert. Occasionally, an event is equal to a UserInsight alert, such as when someone attempts to authenticate with a honey user account, but most of the time, the behaviors that trigger an alert involve a great deal more correlation across data sources and an individual's numerous accounts. Listing the associated user adds valuable context, but some of the behaviors that trigger our alerts can be obscure or unapparent at first glance. This is why we give you an explanation of malicious actions that the alert could signify and some legitimate actions that could cause it to trigger in your organization. More data can be turned into more context, and that helps both UserInsight and our customers understand concerning behavior faster, but you should never have to guess at those two triage questions when an alert hits your inbox.

Anomaly detection may just change the type of alerts

Some people will tell you that canned analytics will solve all of your alert fatigue problems simply because "value(math) > value(rules)". There are definitely a lot of benefits to machine learning algorithms and the anomalous behavior they can expose in your organization, but the need for incident triage is not magically removed because you alert on every anomaly identified across data sources instead of when pre-defined rules are triggered. Outliers are helpful for identifying behavior for which your IR team didn't know they should write a rule, but human beings are not highly predictable. Legitimate users tend to do a lot of things, both on the network and off, that appear anomalous or irrational from an outside perspective, so outliers can be as rife with false positives as the rules they replace.

UserInsight uses baselines and anomaly identification in use cases where our team determines they add value. Not every malicious action is a behavioral anomaly - intruders do not always exfiltrate through a rare port or in significantly large payloads - so relying too heavily on mathematics can cause you to miss just as many concerning activities as an incomplete ruleset. Any alerts we produce based on outlier behavior will be complete with context, so as to avoid two slightly different questions in triage:

  • "Do I care about this anomaly?"
  • "What did someone do different?"

UserInsight will give you access to anomalies we have discovered in your data, but with the goal of using it to quickly complete investigations with confidence that you know the extent of an incident's impact. More data does not have to mean more alerts.

To learn more about UserInsight and Rapid7's other solutions for detecting compromised credentials, check out our compromised credentials resource page and make sure to download our complimentary information toolkit. I expect you'll quickly see how much we can help with your team's alert triage process.