[A version of this blog was originally posted on November 5, 2012]

Few people fully appreciate the difficulty in creating a web application security scanner that can actually work well against most sites. In addition, there is much debate about how much application security testing can be automated and how much needs be done by human hands. Let's look at a recent conversation among some industry experts that took place on Twitter (abbreviated for easier reading).

Jeremiah Grossman @jeremiahg:

_RT @kdinerman: WebApp Scanners Challenged By Modern WebTech http://bit.ly/Tz5IX5 < true, but no way the biggest issue

Neil MacDonald @nmacdona

not the biggest issue < what would you say is the biggest issue?

Jeremiah Grossman @jeremiahg

login & maintaining authed-state, 404 detection, infinite website problem & production safety.

zulla @zulladan

@jeremiahg login & maintaining authed-state, 404 detection – i always believed that whitehatsec was one of the few who solved that

Jeremiah Grossman @jeremiahg

@zulladan ahh. technologically the issues are not “solved.” theyre compensated for w/ [human] config. true for everyone to varying degrees.

Dan Kuykendall @dan_kuykendall

Auth, 404, infinite links, etc are all stuff we solved 5 years ago. Mobile and API are the new challenges

Jeremiah Grossman @jeremiahg

please defined “solved.” As in, the tech does everything automatically w/o human assistance?

Dan Kuykendall @dan_kuykendall

Yes, in 90% of the cases we just need creds, Then automation does the rest. Was hard, but did it.

Neil MacDonald @nmacdona

@dan_kuykendall hard but did it < more & more automation, “good enough” bar higher, humans at top of pyramid

I agree, humans will continue to be at the top of the pyramid when it comes to web application security, but the practical reality is that organizations don't have the time and money to hire enough humans to effectively find and remediate all of their application security vulnerabilities. So, while it is true that we may never be able to automate 100% of the possibilities, it is our job to push forward the art and science of automation. It's not easy, but somebody's gotta do it.

Why Automated Scanning Is Critical

This does not exclude manual training options, but depending solely on manual training is a failed option for most organizations.

1. Auditors rarely know the application very well. When you have three guys on a security team responsible for hundreds or thousands of applications, its unlikely that they know the application.

2. Auditors have limited amount of time to spend training the scanner for each application. Often there  is almost no time at all.

  1. The security team has ever used the applications or
  2. Had time to learn the ins and outs of each one or
  3. Had time to manually configure a scanner with full manual training

3. Auditors time better spent on attacks only humans can do, such as business logic and privilege escalation attacks that automation may never be able to adequately discover.

4. Even SaaS offerings that are aided by manual effort end up being limited by the quality of their automation. Do you really believe that a highly trained security professional is going to review & train the web application security scanner for every nook & cranny of every application? How long would you expect a highly trained security profession to perform a job like this, before they wanted to poke their eyes out with a fork? Not long I'm sure.

5. Quality of the manual training will vary. Manual effort is going to focus on a few areas here and there and train for those high profile areas (the ones that probably have the best secure development applied to them). You may also end up with a less competent person doing the training, and you get less than ideal training data into the scanner. In the end, much falls back to the automation.

Bottom line, the effective web application security scanner must do everything possible to accomplish the best possible scan in a fully automated fashion. The less you leave for the human effort the more effective the human effort will be. It's taken us a decade of pure focus with a team of highly talented team of developers to solve each challenge and to overcome one nitch case after another. We continue to innovate with automation, but we are also looking forward to the next generation of challenges, and the battle ahead.

The Classic Challenges

1. Form Based Logins
There are several challenges here which are important to solve if you ever intend to schedule scans or simple be able to run a point & shoot scan.

2. Single Sign-On
It can be a challenge to be able to login, while avoiding crawling and attacking sites not intended to be part of the attack surface. You must prevent sending credentials to the wrong place, and deal with the various cookies & tokens that get passed back and forth between the various domains/hosts involved in the SSO process.

  • You must automate detection of the login form. There are many possible formats, and they must be distinguished from other forms.
  • Deal with forms that include onsubmit events that do crazy stuff such as client-side encryption of the password to “protect” it over the wire, or calculate some predetermined key based on some other token.
  • Automate the determination of a successful login vs. failed login (diff flavors of failures). This is one of the more challenging tasks that give web application security scanning vendors all sorts of headaches.

3. Auto-Populating Forms with Valid Data
To accomplish the best possible code coverage it is critical to populate form fields with valid data in order to get deep into the application that perform data validation.

Example Scenario:
A billing address form where all the input names/ids are textbox1, textbox2, etc. Additionally the developer added code to require a valid state & zip code.

Weak Solution:
Because the scanner doesn't know what would be valid inputs for textbox1, textbox2, etc, the scanner might enter a bunch of aaaaaaaa's into the fields.

Problem Remains:
The web application security scanner will basically be dead in the water without user training

  • It will not pass this step, which could be step one in a multi-step process.
  • It will miss out on the SQL vuln possible in the street address field because the SQL INSERT happens several lines of code after the state & zip code validation.

4. Dynamic Changes Based on User Events
Often we see changes based on user action. An example is an onchange event for an option list. The javascript that gets executed might changes the possible form field, or may populate hidden fields with data. If you do not perfectly emulate what would have happened in a browser, you can often fail the basic validation that takes place and never get to deliver your attack payloads.

5. Session Management
It is a constant challenge to stay logged into an application. The scanner must avoid logout buttons/links/events, must properly pass along session tokens wherever they happen to be at the moment (sometimes cookies, sometimes on the URL, sometimes in hidden form field) and adjust to multiple possibilities taking place on a single app. The scanner must also properly identify when it has lost its session, and then be able to re-login (requires automated login process mentioned above) to continue its scan.

6. 404 Detection
Some sites will use the standard 404 handler, but most have started to customize them to offer a better user experience. The scanner must employ a collection tricks & techniques to solve the possible scenarios, or, you end up with endless new links on many sites.

  • Custom 404 that response as a 200. This is the simple one, but many scanners will get caught by this
  • SEO friendly sites – In most of these applications there are no real files, and instead all 404 responses are trapped and processed through the framework to look up the intended content from a database. This can cause scanners to be unable to detect real content from 404 equivalent response.
  • Different 404 handlers based on directory. We see many sites that might have a different 404 handler for one application. A simple example is when your site includes a blog that may be installed as www.site.com/blog/. The blogging software may use SEO friendly URL's, thereby making your scanner think that EVERY page under /blog/ exists.

7. Limiting Repetitive Functionality
Let's say your scanning an online store with 100,000 items.

  • viewproduct.aspx?productid=5
  • viewproduct.aspx?productid=6

or maybe it looks like:

  • /product/5/view
  • /product/6/view

You must auto-detect these situations and properly limit the amount of testing or your scan will basically run for a very long time, and when it does eventually complete it might end up reporting the same vulnerability (or root cause) 1000's of times.

8. Memory Management
As mentioned earlier, a web scanner is a very complex software engineering task. You can ask around to find that, even companies such as HP & IBM are known for having their scanner crash in large part due to memory management issues. The reason is that each web application is different, but all responses must be parsed & analyzed. This parsing and analysis of unpredictable response data ends up requiring very solid engineering to handle properly.

I will save this for another blog post.

Those examples are just the start of the crawling problems that come to the top of my mind. I haven't even started to mix in attacking and how that can cause session loss, and then how to find new application security vulnerabilities (known vulns don't exist in this world of custom apps) while avoiding false positives, and eventually delivering a usable/useful report that a tester and a developer can both make use of to hopefully fix the problems the application security scanner finds. Trust me, the solution to each problem and its many flavors are each hard fought battles.

Time after time, as product after product attempts to face these challenges, we see them give up and move toward manual training. Enticing manual training interfaces move front and center. Point and shoot falls to the wayside.

At Rapid7, we have confronted these challenges and have invented automation techniques to solve them. We have won those battles. Now we are setting our sights on the future problems. To read more about the battles we are fighting now, download our new whitepaper on The Demands of the Modern Web Scanner: Closing the Coverage Gap. Or, if you are skeptical that we can effecitvely address these problems for your custom application, go ahead, request a free trial. I dare you! (Free Trial - AppSpider)

[Note: This blog has been transferred from Dan Kuykendall's blog, manvswebapp.com, as part of Rapid7's acquisition of NT OBJECTives. For more information on the acquisition, click here.]