2016 has been a big year for information security, as we've seen attacks by both cybercriminals and state actors increase in size and public awareness, and the Internet of Things comes into its own as a field of study. But today we'd like to talk about a very old (but no less dangerous) type of attacker tool – web shells – and new techniques Rapid7 is developing for identifying them quickly and accurately.

What is a Web Shell?

Web shells are web-based applications that provide a threat actor with the ability to interact with a system – anything from file access and upload to the ability to execute arbitrary code on the exploited server. They're written in a variety of languages, including PHP, ASP, Java and JavaScript, although the most common is PHP (since the majority of systems support PHP). Once they're in your system, the threat actor can use them to steal data or credentials, gain access to more important servers in the network, or as a conduit to upload more dangerous and extensive malware.

Why should I care?

Because web shells can hit pretty much anyone. They most commonly show up on small business web presences, particularly Wordpress-powered sites, as Wordpress plugins and themes are a favoured target for web shell authors (since vulnerabilities show up in them all the time).

Wordpress isn't, of course, alone - virtually all web applications are released with vulnerabilities from time to time. So if you have a website that accepts and stores any kind of user input, from forum posts to avatar images, now is a fine time to learn about web shells, because you could very well be vulnerable to them.

How Web Shells Work and How They're Used

The first step with a web shell is uploading it to a server, from which the attacker can then access it. This “installation” can happen in several ways, but the most common techniques involve:

  • exploiting a vulnerability in the server's software,
  • getting access to an administrator portal, or
  • taking advantage of an improperly configured host.

As an example, Rapid7's Incident Response Team has dealt with several engagements where the attackers took advantage of a vulnerability in a third-party plugin used by a customer's CMS enabling them to upload a simple PHP web shell.

Once a web shell is uploaded, it's used to exploit the system. What this looks like differs from actor to actor, and from web shell to web shell, because shells can come with a variety of capabilities. Some are very simple and simply open a connection to the outside world, allowing an actor to drop in more precise or malicious code, and then execute whatever they receive. Others are more complex and come with database or file browsers, letting the attacker rifle through your code and data from thousands of miles away.

Whatever the design, web shells can be both extremely dangerous and common – US-CERT has identified them as a regularly used tool of both cyber-criminals and Advanced Persistent Threats (APTs). If they're not detected and eliminated, they can provide an attacker with not only a solid, persistent backdoor into your environment but potentially root access, depending on what they compromise.

Web Shell Detection

Web shells aren't new, and people have spent a lot of time working to detect and halt them. Once the breach of a system is discovered, it's fairly straightforward (although time consuming) to just go through the server looking at the upload and modification dates of files, relative to the discovery date, and manually check suspicious-looking uploads to see if they're the source of the problem. But what about detecting web shells before they're used to cause harm?

There are a couple of ways of doing it. One approach is to have an automated system look at the contents of newly uploaded or changed files and see if they match a known web shell, just as antivirus software does with other forms of malware. This works well if an attacker is using a known web shell, but quickly falls apart when confronted with custom code.

Another technique is to use pattern matching to look for code fragments (down to the level of individual function calls) that are commonly malicious, such as calls out to the system to manipulate files or open connections. The problem there is that web shell authors are fully aware of this technique, and deliberately write their code in a very opaque and confusing way that makes pattern matching extraordinarily difficult to do with any real accuracy.

A Better Way

If we can detect web shells, we can stop them, and if we can stop them, we can protect our customers – but as you see, all the existing approaches have some pretty severe drawbacks. Meaning they miss a lot.

Rapid7 Labs has been working on a system that uses data science to classify web shell threats based on static and dynamic analysis of PHP files. In a static analysis context, our classifier looks for both dangerous looking function calls and file signatures plus coding techniques that developers simply wouldn't do if they were writing legitimate, production ready code – things that only appear when the developer is trying to hide their purpose. In a dynamic analysis context the potentially malicious file is executed on a monitored, standalone system so our classifier can see what it does.

The results from both these methods are then fed into a machine learning model, which predicts whether the file is malicious or not, and the accuracy rate has been extremely promising, with the system detecting 99% of the hundreds of web shells we've tested it on, including custom, single use shells, with only a 1% false-positive rate. The result is that (to broadly generalize), if our AR team is faced with 1,000 files, they only need to manually check 10. (For those ML-nerds out there, yes we've checked for over-fitting.)

In the future we hope to use the system to pre-emptively detect web shells, identifying and isolating them before they exploit the system. Until that point, it's being used by our managed detection and response team, letting them identify the source of customer breaches far more quickly than teams relying solely on traditional, arduous and error-prone manual methods.

Oliver Keyes,
Senior Data Scientist

Tim Stiller, Senior Software Engineer