Rapid7 is publishing a report about the passwords attackers use when they scan the internet indiscriminately. You can pick up a copy at booth #4215 at the RSA Conference this week, or online right here. The following post describes some of what is investigated in the report.

Announcing the Attacker's Dictionary

Rapid7's Project Sonar periodically scans the internet across a variety of ports and protocols, allowing us to study the global exposure to common vulnerabilities as well as trends in software deployment (this analysis of binary executables stems from Project Sonar).

As a complement to Project Sonar, we run another project called Heisenberg which listens for scanning activity. Whereas Project Sonar sends out lots of packets to discover what is running on devices connected to the Internet, Project Heisenberg listens for and records the packets being sent by Project Sonar and other Internet-wide scanning projects.

The datasets collected by Project Heisenberg let us study what other people are trying to examine or exploit. Of particular interest are scanning projects which attempt to use credentials to log into services that we do not provide. We cannot say for sure what the intention is of a device attempting to log into a nonexistent RDP server running on an IP address which has never advertised its presence, but we believe that behavior is suspect and worth analyzing.

How Project Heisenberg Works

Project Heisenberg is a collection of low interaction honeypots deployed around the world. The honeypots run on IP addresses which we have not published, and we expect that the only traffic directed to the honeypots would come from projects or services scanning a wide range of IP addresses. When an unsolicited connection attempt is made to one of our honeypots, we store all the data sent to the honeypot in a central location for further analysis.

In this post we will explore some of the data we have collected related to Remote Desktop Prodocol (RDP) login attempts.

RDP Summary Data

We have collected RDP passwords over a 334 day period, from 2015-03-12 to 2016-02-09.

During that time we have recorded 221203 different attempts to log in, coming from 5076 distinct IP addresses across 119 different countries, using 1806 different usernames and 3969 different passwords.

Because it wouldn't be a discussion of passwords without a top 10 list, the top 10 passwords that we collected are:

password count percent
x 11865 5.36%
Zz 10591 4.79%
St@rt123 8014 3.62%
1 5679 2.57%
P@ssw0rd 5630 2.55%
bl4ck4ndwhite 5128 2.32%
admin 4810 2.17%
alex 4032 1.82%
....... 2672 1.21%
administrator 2243 1.01%

And because we have information not only about passwords, but also about the usernames that are being used, here are the top 10 that were collected:

username count percent
administrator 77125 34.87%
Administrator 53427 24.15%
user1 8575 3.88%
admin 4935 2.23%
alex 4051 1.83%
pos 2321 1.05%
demo 1920 0.87%
db2admin 1654 0.75%
Admin 1378 0.62%
sql 1354 0.61%

We see on average 662.28 login attempts every day, but the actual daily number varies quite a bit. The chart below shows the number of events per day since we started collecting data. Notice the heavy activity in the first four months, which skews the average high.

In addition to the username and password being used in the login attempts that we captured, we also collected the IP address of the device making the login attempt. To the best of the ability of the GeoIP database we used, here are the top 15 countries from which the collected login attempts originate:

country country code count percent
China CN 88227 39.89%
United States US 54977 24.85%
South Korea KR 13182 5.96%
Netherlands NL 10808 4.89%
Vietnam VN 6565 2.97%
United Kingdom GB 3983 1.80%
Taiwan TW 3808 1.72%
France FR 3709 1.68%
Germany DE 2488 1.12%
Canada CA 2349 1.06%

With the data broken down by country, we can recreate the chart above to show activity by country for the top 5 countries:

RDP Highlights

There is even more information to be found in this data beyond counting passwords, usernames and countries.

We guess that these passwords are selected because whomever is conducting these scans believes that there is a chance they will work. Maybe the scanners have inside knowledge about actual usernames and passwords in use, or maybe they're just using passwords that have been made available from previous security breaches in which account credentials were leaked.

In order to look into this, we compared all the passwords collected by Project Heisenberg to passwords listed in two different collections of leaked passwords. The first is a list of passwords collected from leaked password databases by Crackstation. The second list comes from Mark Burnett.

In the table below we list how many of the top N passwords are found in these password lists:

top password count num in any list percent
1 1 100.00%
2 2 100.00%
3 2 66.67%
4 3 75.00%
5 4 80.00%
10 8 80.00%
50 28 56.00%
100 55 55.00%
1000 430 43.00%
3969 1782 44.90%
This means that 8 of the 10 most frequently used passwords were also found in published lists of leaked passwords. But looking back at the top 10 passwords above, they are not very complex and so it is not surprising that they appear in a list of leaked passwords.

This observation prompted us to look at the complexity of the passwords we collected. Just about any time you sign up for a service on the internet – be it a social networking site, an online bank, or a music streaming service – you will be asked to provide a username and password. Many times your chosen password will be evaluated during the signup process and you will be given feedback about how suitable or secure it is.

Password evaluation is a tricky and inexact art that consists of various components. Some of the many aspects that a password evaluator may take into consideration include:

  • length
  • presence of dictionary words
  • runs of characters (aaabbbcddddd)
  • presence of non alphanumeric characters (!@#$%^&*)
  • common substitutions (1 for l [lowercase L], 0 for O [uppercase o])

Different password evaluators will place different values on each of these (and other) characteristics to decide whether a password is "good" or "strong" or "secure". We looked at a few of these password evaluators, and found zxcvbn to be well documented and maintained, so we ran all the passwords through it to compute a complexity score for each one. We then looked at how password complexity is related to finding a password in a list of leaked passwords.

complexity # passwords % crackstation crackstation % Burnnet Burnett % any any % all all %
0 803 20.23 726 90.41 564 70.24 728 90.66 562 69.99
1 1512 38.10 898 59.39 634 41.93 939 62.10 593 39.22
2 735 18.52 87 11.84 37 5.03 94 12.79 30 4.08
3 567 14.29 13 2.29 5 0.88 13 2.29 5 0.88
4 352 8.87 7 1.99 4 1.14 8 2.27 3 0.85

The above table shows the complexity of the collected passwords, as well as how many were found in different password lists.

For instance, with complexity level 4, there were 352 passwords classified as being that complex, 7 of which were found in the crackstation list, and 4 of which were found in the Burnett list. Furthermore, 8 of the passwords were found in at least one of the password lists, meaning that if you had all the password lists, you would find 2.27% of the passwords classified as having a complexity value of 4. Similarly, looking across all the password lists, you would find 3 (0.85%) passwords present in each of the lists.

From this we extrapolate that as passwords get more complex, fewer and fewer are found in the lists of leaked passwords. Since we see that attackers try passwords that are stupendously simple, like single character passwords, and much more complex passwords that are typically not found in the usual password lists, we can surmise that these attackers are not tied to these lists in any practical way -- they clearly have other sources for likely credentials to try.

Finally, we wanted to know what the population of possible targets looks like. How many endpoints on the internet have an RDP server running, waiting for connections? Since we have experience from Project Sonar, on 2016-02-02 the Rapid7 Labs team ran a Sonar scan to see how many IPs have port 3389 open listening for tcp traffic. We found that 10822679 different IP addresses meet that criteria, spread out all over the world.

So What?

With this dataset we can learn about how people looking to log into RDP servers operate. We have much more detail in the report, but some our findings include:

  • We see that many times a day, every day, our honeypots are contacted by a variety of entities.
  • We see that many of these entities try to log into an RDP service which is not there, using a variety of credentials.
  • We see that a majority of the login attempts use simple passwords, most of which are present in collections of leaked passwords.
  • We see that as passwords get more complex, they are less and less likely to be present in collections of leaked passwords.
  • We see that there is a significant population of RDP enabled endpoints connected to the internet.

But wait, there's more!

If this interests you and you would like to learn more, come talk to us at booth #4215 the RSA Conference.