Lesson 1: There is no perfect SOC
When you think of a Security Operations Center (SOC), what do you think of? 24/7 security monitoring? A fancy war room with a giant threat map?
When I started my career, I worked as a security analyst at an MSSP. We operated as a meta-SOC that performed security monitoring and engineering services for Fortune 500 customers. We had the fancy war room, the dedicated personnel, and 24/7 monitoring.
From one perspective, because we were mandated to provide the best in class security monitoring for our customers, we were lucky. We had a lot of great tools. We had budget and talent at our disposal. However, working as an external partner for our customers, there were some limitations for the organizations we were servicing, as well as for our team.
For example, because we were performing monitoring for external organizations, we were often limited in our ability to respond beyond escalating notifications and making recommendations. On the other hand, internal SOCs can provide deeper incident response capabilities via processes and tools as well as a greater ability to influence the execution of these processes inside of their organization.
My point is that there is no perfect SOC. All security teams operate under budget, resource, and organizational constraints. Not every organization has the technology and access required to do security monitoring or incident response perfectly. SANS describes the ideal roles in a high-functioning team, but not every company has personnel dedicated to the event triage, malware analyst, hunter, and responder roles - oftentimes, these roles are blurred. And of course not everyone has (or needs) a fancy war room.
Lesson 2: A SOC cannot operate effectively without external buy-in from the business
The primary purpose of a security operations center is to detect and respond to threats to the business. But to protect the business, you must have buy-in from the business.
A security team by nature rarely produces assets that are revenue-generating — it’s a cost center designed to protect the other aspects of the organization that do produce such assets. To implement effective protections, you must have the buy-in of cross-functional leadership to help enforce and implement a strong security posture — along with great communication in order to understand what you are protecting and why. Without collaboration and a great security culture, the SOC will face an uphill battle and problems will not get fixed. How many security teams have you seen blindly send out vulnerability reports that come from Qualys scanners with the same issues on them month after month? Chances are, a lot.
Working at an MSSP, I saw this magnified ten-fold. If we noticed a series of machines infected with a worm, we could only escalate issues to the business contacts we were provided. It was up to the organization to then coordinate the response. Some organizations just let these infected machines languish for months simply because they couldn’t get the owners to remediate the issues (despite potentially creating risk for themselves). From our perspective, having to juggle the same incidents over and over again lead to incident management headaches.
Lesson 3: Technology choices should augment personnel and processes
How often have you have seen organizations buy security products and just leave them sitting around gathering dust in a box in a data center? Too many, I’d gather.
Does it make sense to purchase a threat intelligence capability if you have no security monitoring or incident handling process? Why buy a fancy sandbox if you have no use case for it or can’t dedicate personnel time to configure it?
Working in a SOC, we frequently saw security monitoring devices that were misconfigured or untuned. Many of these systems generated so much noise that it would be impossible for human beings to analyze the data. Or, worse, the devices yielded no valuable data at all because they were not installed properly. Yet, real budget dollars went into these products.
Technology choices should be made thoughtfully. They should make it easier for your team to do their jobs, not harder. Before buying an expensive commercial solution, I recommend security teams instead start with a small proof of concept using several open source solutions tied together by automation to first demonstrate the value of the use case.
Lesson 4: Alert fatigue is a source of error and talent attrition
Pure alert triage is often boring and tedious work. Alert fatigue is part of why I left the SOC I was working in after less than a year to pursue a development role to help improve the SOC’s monitoring capabilities and tools.
Talent attrition is one unfortunate result of alert fatigue, but another often overlooked result is human error. It’s very easy for a human being to skip one step in a triage process they do tens or hundreds of times per day -- and it’s even easier to let your eyes glaze over and stop looking at your event console at all.
Security teams that claim to hire the best personnel yet make them do alert triage all day are doing it wrong. Where possible, alert triage and other tedious tasks should be offloaded to other parts of the business or the ‘manual’ work involved in triage should be reduced. These options are both possible with automation.
The most effective teams challenge their security analysts to provide ideas to improve their efficiency and accuracy, and empower them to make changes in their processes and tools.
Lesson 5: Alerts without context are not actionable
Working within a SOC at an MSSP, we were often limited to performing security monitoring with the products that our customers purchased and deployed. As an analyst, it was common to see security alerts from products like Snort that had only a fragment of a packet captured and a vague signature for context. Alerts like this as rule were shoved into the ‘false positive’ bucket, though I frequently wondered how many of these were actually true positives that we simply were unable to confirm.
This is why a thorough network security monitoring (NSM) approach recommends full packet capture, host forensics, ETDR logs, app server logs, network analytics data, and other context to do security monitoring effectively. However, getting all of this stuff into your SIEM or log management solution is often difficult and impractical to do in a scalable way. Using automation, it’s quite easy to kick off an on-demand query for this data when the analyst needs it.
Lesson 6: Tribal knowledge is not scalable
When I was working in the SOC, I learned how to investigate security events by sitting next to a more experienced analyst and watching them work and talk through their thinking and processes. I also remember combing through numerous and endless blog posts and websites, trying to improve my skills on my own.
However, this is not the most efficient way for us as an industry to share knowledge, nor is it scalable. But finding a mentor in the security industry is tough. Unless you are already working in a formal capacity in this industry, how do you get access to this knowledge to start your career in defense?
As a community we need to create better ways of documenting and sharing our processes for incident handling and response.
Onwards: Implementing Better SOC Practices
There are a lot of great things that SOCs can accomplish that I’ve experienced first-hand, but at the same time there is a lot to be desired when it comes to operating at peak effectiveness. Companies don’t necessarily need a dedicated war room, but they do need the right people, processes and technologies in place to be effective.
From my years working within an SOC, it’s clear that in order to better integrate with the larger organization and provide more value, wider adoption of automation can help.