Staged payloads are used by Metasploit to help reduce the size of the initial payload blob that needs to be transmitted as part of an exploitation attempt.  These stagers typically connect to a Metasploit client and bootstrap (read in) a second stage payload blob which is subsequently executed.  There's a problem with this approach, however, and it has to do with partial reads.

It just so happens that Metasploit has a stage (the DLL injection stage) that is nearly 3000 bytes in size.  On all modern operating systems, TCP is designed to prevent fragmentation through the use of TCP MSS (maximum segment size).  TCP MSS makes it possible to send packets whose size stays at least below an outgoing interface's MTU of the underlying interface by restricting the amount of data that can be sent in any individual packet based on the MTU minus the overhead added by the IP and TCP headers.  While this definitely improves overall network performance, it also means that it's possible for a target machine that is being sent data over TCP to receive part of a "message" that was sent as a whole buffer from the sender.  This is typically called a partial read.  Since TCP is a streaming protocol, a partial read is perfectly fine.  The application is responsible for any internal buffering of underlying messages.  For a payload stager, however, this can be disasterous.

It's pretty easy to illustrate this point.  Write a server in C that binds to a port, accepts a connection, and then receives a fixed amount of data (say, 3000 bytes) from the wire.  The server might look something like this:


int main(int argc, char **argv)
{
int fd = socket(AF_INET, SOCK_STREAM, 0), c;
struct sockaddr_in s;
char buf[3000];
int len;

s.sin_family      = AF_INET;
s.sin_port        = htons(4444);
s.sin_addr.s_addr = INADDR_ANY;

bind(fd, (struct sockaddr *)&s, sizeof(s));
listen(fd, 1);
while (1)
{
  c = accept(fd, NULL, NULL);
  len = recv(c, buf, sizeof(buf), 0);
  printf("Got %lu bytes\n", len);
  close(c);
}
}

When connecting to this server and sending some data with data piped to nc, the output is the following:


On the client:

$ echo -ne "abcd" | nc server 4444

On the server:

Got 4 bytes

This looks just like you'd expect.  We sent four bytes and the server read four bytes.  But what happens if we send 3000 bytes?


On the client:

$ perl -e 'print "A" x 3000;' | nc server 4444

On the server:

Got 1448 bytes

This might seem a bit strange.  We obviously sent 3000 bytes at once from our perspective as the client, but the server only read 1448 bytes.  What's going on here?  To answer this, we need to look at the packet capture:


03:47:20.324613 IP x.x.x.x.38720 > y.y.y.y.4444:
   S 982288742:982288742(0) win 5840
03:47:20.324787 IP y.y.y.y.4444 > x.x.x.x.38720:
   S 3678693369:3678693369(0) ack 982288743 win 5792
03:47:20.391740 IP x.x.x.x.38720 > y.y.y.y.4444:
   . ack 1 win 92
03:47:20.398728 IP x.x.x.x.38720 > y.y.y.y.4444:
   . 1:1449(1448) ack 1 win 92
03:47:20.398805 IP y.y.y.y.4444 > x.x.x.x.38720:
   . ack 1449 win 543
03:47:20.398838 IP y.y.y.y.4444 > x.x.x.x.38720:
   F 1:1(0) ack 1449 win 543
03:47:20.405973 IP x.x.x.x.38720 > y.y.y.y.4444:
   . 1449:2897(1448) ack 1 win 92

In the above, we can see that two TCP segments are sent in the capture.  The first TCP segment is 1448 bytes in size as is the second (and a third follows which is not shown).  Notice that the remote server, y.y.y.y, ACKs the first 1448 bytes before receiving the second 1448.  It also sends a FIN indicating that it has closed its half of the connection.  If the 3000 bytes we were sending had been a payload blob, only half of the payload blob would have arrived and been executed.  This would certainly lead to an unexpected crash.

This problem has been known about for some time.  The most common solutions typically involve implementing a receive loop that reads an expected number of bytes before executing the stage.  For instance, an attacking machine might transmit a four byte value describing the length of the stage.  The target machine's stager can then loop calling receive until the expected number of bytes have been read.  While this works perfectly fine, it adds to the size of the stager (which is meant to be small) and also adds a potentially signaturable network effect.

The obvious question at this point is whether or not we can do something better.  It's important to think about the behavior of TCP segments and the way that receive queues are managed in modern operating systems.  TCP is designed to be a reliable transport that is capable of experiencing intermittent packet loss.  It supports this by describing communication in terms of a receive window that may consist of multiple segments.   Due to way that packets are routed on the internet, it may be possible for TCP segments to arrive out of order.  Since TCP is a streaming protocol, the order of transmitted data must be preserved.  In the case of mainstream operating systems, it seems to be common practice that out of order segments are queued rather than discarded in an effort to reduce the number of retransmissions.  The implications of this are the key to solving our problem.

To solve the partial read issue, we must have a mechanism to ensure that the entire stage shows up in the target socket's receive buffer at the same time.  If we assume that the target's operating system will retain out of order segments, then we may be able to make our example stager reliable, even across the internet.  We could even detect this through normal communication with the target host during exploitation, assuming TCP is involved. 

There are two simple ways that this could be done.  The first way would involve reversing the order of TCP segments sent by the attacker with a moderate delay added between each transmit in order to reduce the chances of a transient routing condition.  The nice thing about this is that it would be challenging to discern this behavior from that of potentially real internet traffic.  An alternative approach that is arguably easier to implement though perhaps easier to identify would involve transmitting all TCP segments except for the one accounting for the first byte of the data being sent.  After all of the other segments have been transmitted, the first byte can be sent causing the target machine to completely reassemble and place the result in the receive buffer for the associated connection.  A quick proof of concept test seems to show that this is feasible when sending an 861 byte packet (at least against a Linux target).  It's thought that other platforms will share this behavior though this has not been determined.


On the server:

Got 861 bytes

With the associated packet capture:

03:29:05.914211 IP x..x.x.x.43807 > y.y.y.y.4444:
S 3870094441:3870094441(0) win 5840
03:29:05.914270 IP y.y.y.y.4444 > x..x.x.x.43807:
S 130543367:130543367(0) ack 3870094442 win 5792
03:29:06.094311 IP x..x.x.x.43807 > y.y.y.y.4444:
. ack 1 win 1460
03:29:09.353861 IP x..x.x.x.43807 > y.y.y.y.4444:
P 2:862(860) ack 1 win 65535
03:29:09.354071 IP y.y.y.y.4444 > x..x.x.x.43807:
. ack 1 win 362
03:29:09.357354 IP x..x.x.x.43807 > y.y.y.y.4444:
P 1:2(1) ack 1 win 1460
03:29:09.357499 IP y.y.y.y.4444 > x..x.x.x.43807:
. ack 862 win 362

Note that the segment describing bytes 2-862 arrives before the segment describing the first byte.  This example should be applicable to larger packet sizes though the tool that was used to test is currently not capable of trying this out.  Thanks to anonymous for helping to test this :-).

Even though this appears to work in testing there are definitely some real problems with it.  There's a chance that a stateful firewall or IPS device may be in between the attacker and the target machine.  If this is the case then it cannot be safely assumed that the target machine will receive the packets out of order.  This is due to the fact that the transparent device may perform stream reassembly and then not preserve the out of order characteristic when transmitting data out the other side.  While this may the case, it could be argued that it is likely that such a device would often be in close proximity of the target machine, thus increasing the likelihood of the entire stage being present in the receive buffer even if sent in order due to the decreased latency.  Another problem has to do with the maximum size of the stage that can be sent.  This restriction is constrained by both the window size and the size of the receive buffer associated with the socket on the server doing the receiving.  There might be some other scenarios and/or platforms that make this approach impossible to use (please post a comment if you're aware of any).

The idea of intentionally sending out of order segments is definitely not new.  Fragroute has supported this for quite some time.  Still, the application of out of order segments to payload staging may not have been as obvious.  It's unlikely that Metasploit will implement this in the immediate future.  We currently use an 89 byte intermediate stager when necessary.  This solves this problem without having to alter our existing stagers.  Still, it seems like a fun possibility if payload size restrictions happen to be exceedingly tight .