Using Internet-scale Research Data to Quantify and Reduce Exposure

It’s been a busy 2017 at Rapid7 Labs. Internet calamity struck swift and often, keeping us all on our toes and giving us a chance to fully test out the capabilities of our internet-scale research platform. Let’s take a look at how two key components of Rapid7 Labs’ research platform—Project Heisenberg and Heisenberg Cloud—came together to enumerate and reduce exposure the past two quarters. (If reading isn't your thing, we'll cover this in person at today's UNITED talk.)

Project Sonar Refresher

Back in “the day” the internet really didn’t need an internet telemetry tool like Rapid7's Project Sonar. This:


was the extent of what would eventually become the internet and it literally had a printed directory that held all the info about all the hosts and users:

Fast-forward to Q1 2017 where Project Sonar helped identify a few hundred million hosts exposing one or more of 30 common TCP & UDP ports:

National Exposure 2017 Hilbert IPv4 Heatmap

Project Sonar is an internet reconnaissance platform. We scan the entire public IPv4 address range (except for those in our opt-out list) looking for targets, then do protocol-level decomposition scans to try to get an overall idea of “exposure” of many different protocols, including:

Sonar Study Protocols

In 2016, we began a re-evaluation and re-engineering project of Project Sonar that greatly increased the speed and capabilities of our core research gathering engine. In fact, we now perform nearly 200 “studies” per-month collecting detailed information about the current state of IPv4 hosts on the internet. (Our efforts are not random, and there’s more to a scan than just a quick port hit; there’s often quite a bit of post-processing engineering for new scans, so we don’t just call them “scans.”)

Sonar has been featured in over 20 academic papers (see for yourself!) and is a core part of the foundation for many popular talks at security conferences (including 3 at BH/DC in 2017).

We share all our scan data through a research partnership with the University of Michigan — Keep reading to see how you can use this data on your own to help improve the security posture in your organization.

Cloudy With A Chance Of Honeypots

Project Sonar enables us to actively probe the internet for data, but this provides only half the data needed to understand what’s going on. Heisenberg Cloud is a sensor network of honeypots developed by Rapid7 that are hosted in every region of every major cloud provider (the following figure is an example of Heisenberg global coverage from three of the providers).

Heisenberg Cloud Coverage

Heisenberg agents can run multiple types and flavors of honeypots. From simple tripwires that enable us to enumerate activity:

Street Car Sensor

to more stealthy ones that are designed to blend in by mimicking real protocols and servers:

Stealth Mode

All of these honeypot agents are managed through traditional, open source cloud management tools.

We collect all agent-level log data using Rapid7's InsightOps tool and collect all honeypot data—including raw PCAPs—centrally on Amazon S3. We have Hesienberg nodes appearing to be everything from internet cameras to MongoDB servers and everything in between.

But, we’re not just looking for malicious activity. Heisenberg also enables us to see cloud and internet service “misconfigurations”—i.e., legit, benign traffic that is being sent to a node that is no longer under the control of the sending organization but likely was at some point. We see database queries, API calls, authenticated sessions and more and this provides insight into how well organizations are (or aren’t) configuring and maintaining their internet presence.

Putting It All Together

We convert all our data into a column-storage format called “parquet” that enables us to use a wide array of large-scale data analysis platforms to mine the traffic. With it, we can cross-reference Sonar and Heisenberg data—along with data from feeds of malicious activity or even, say, current lists of digital coin mining bots—to get a pretty decent picture of what’s going on.

This past year (to date), we’ve publicly used our platform to do everything from monitoring Mirai (et al) botnet activity to identifying and quantifying (many) vulnerable services to tracking general protocol activity and exposure before and after the Shadow Brokers releases. Privately, we’ve used the platform to develop custom feeds for our Insight platform that helps users identify, quantify and reduce exposure. Let’s look into a few especially fun and helpful cases we’ve studied:

Sending Out An S.O.S.

Long-time readers of the Rapid7 blog may remember a post we did on protestors hijacking internet-enabled devices that broadcasters use to get signals to radio towers. We found quite a bit of open and unprotected devices:

Map of Open Broadcast Nodes

What we didn’t tell you is that Rapid7’s Rebekah Brown worked with the National Association of Broadcasters to get the word out to vulnerable stations. Within 24 hours the scope of the issue was reduced by 50% and now only a handful (~15%) remain open and unprotected. This is an incredible “win” for the internet as exposure reduction like this is rarely seen.

We used our Sonar HTTP study to look for candidate systems and then performed a targeted scan to see if each device was — in fact — vulnerable. Thanks to the aforementioned re-engineering efforts, these subsequent scans take between 30 minutes to three hours (depending on the number of targets and complexity of the protocol decomposition). That means, when we are made aware of a potential internet-wide issue, we can get active, current telemetry to help quantify the exposure and begin working with CERTs and other organizations to help reduce risk.

Internet of Exposure

It’d be too easy to talk about the Mirai botnet or stunt-hacking images from open cameras. Let’s revisit the exposure of a core component of our nation’s commercial backbone: petroleum. Specifically, the gas we all use to get around.

Gas Pumps

We’ve talked about it before and it’s hard to believe (or perhaps not, in this day and age) such a clunky device...

ATG 350 Control Pad

...can be so exposed. We’ve shown you we can count these IoThings but we’ve taken the ATG monitoring a step further to show how careless configurations could possibly lead to exposure of important commercial information.

Want to know the median number of gas tanks at any given petrol station? We’ve got an app for that:

ATG Obtained Gas Tank Distribution

Most stations have 3-4 tanks, but some have many more. This can be sliced-and-diced by street, town, county and even country since the vast majority of devices provide this information with the tank counts.

How about how much inventory currently exists across the stations?

ATG Obtained Fuel Volume Distribution

We won’t go into the economic or malicious uses of this particular data, but you can likely ponder that on your own. Despite previous attempts by researchers to identify this exposure—with the hopeful intent of raising enough awareness to get it resolved—we continue to poke at this and engage when we can to help reduce this type of exposure. Think back on this whenever your organization decides to deploy an IoT sensor network and doesn’t properly risk-assess the exposure depending on the deployment model and what information is being presented through the interface.

But, these aren’t the only exposed things. We did an analysis of our Port 80 HTTP GET scans to try to identify IoT-ish devices sitting on that port and it’s a mess:

IoT Heatmap 1

You can explore all the items we found here but one worth calling out is:

IoT Heatmap 2

These are 251 buildings—yes, buildings—with their entire building management interface directly exposed to the internet, many without authentication and not even trying to be “sneaky” and use a different port than port 80. It’s vital that you scan your own perimeter for this type of exposure (not just building management systems, of course) since it’s far too easy to have something slip on to the internet than one would expect.

Wiping Away The Tears

Rapid7 was quick to bring hype-free information and help for the WannaCry “digital hurricane” this past year. We’ve migrated our WannaCry efforts over to focused reconnaissance of related internet activity post-Shadow Brokers releases.

Post-Shadow Brokers SMB Chart

Since WannaCry, we’ve seen a major uptick in researchers and malicious users looking for SMB hosts (we’ve seen more than that but you can read our 2017 Q2 Threat Report for more details). As we work to understand what attackers are doing, we are developing different types of honeypots to enable us to analyze—and, perhaps even predict—their intentions.

We’ve done even more than this, but hopefully you get an idea of the depth and breadth of analyses that our research platform enables.

Take Our Data...Please!

We provide some great views of our data via our blog and in many reports:

Rapid7 Research Reports

But, YOU can make use of our data to help your organization today. Sure, Sonar data is available via Metasploit (Pro) via the Sonar C, but you can do something as simple as:

$ curl -o smb.csv.gz\

$ gzcat smb.csv.gz | cut -d, -f4,4 | grep MY_COMPANY_IP_ADDRESSES

to see if you’re in one of the study results. Some ones you really don’t want to show up in include SMB, RDP, Docker, MySQL, MS SQL, MongoDB. If you’re there, it’s time to triage your perimeter and work on improving deployment practices.

You can also use other Rapid7 open source tools (like dap) and tools we contribute to (such as the ZMap ecosystem) to enrich the data and get a better picture of exposure, focusing specifically on your organization and threats to you.


We’ve got more in store for the rest of the year, so keep an eye (or RSS feed slurper) on the Rapid7 blog as we provide more information on exposure.

You can get more information on our studies and suggest new ones via