Synopsis:

Bro, a powerful network security monitor, which by default churns out ASCII logs in a easily parseable whitespace separated (column) format from network traffic, live or PCAP. Because this logs are in the aforementioned format it makes them very hackable with the standard unix toolset. If you’re an experienced unix user with ample networking knowledge you probably have all the know-how to immediately pull useful data from Bro logs. If you’re not familiar with the standard unix toolset e.g. cat, grep, awk, uniq, sort, head, tailetc., digging through Bro logs is a great way to learn these tools and also to gain a better understanding of network traffic.

Log Data:

For these examples, a combination of file globbing, regular expressions, counting, sorting, basic arithmetic, and pattern matching will be used to examine log data.

The current working directory from which all the query examples are performed is $BROPATH/logs/, default location is /usr/local/bro/logs/.

Logs that have been rotated are in folders named with the date and are compressed in the gzip format. You will need to either decompress the logs first or use tools that do on the fly decompression e.g. zcat, zgrep etc. We will use the latter approach in this article because it’s the most common. If you’re on OSX or a BSD system, zcat is not included in the base install, you should use gzcat instead.

Log Format Header Blocks:

#separator \x09 
#set_separator , 
#empty_field (empty) 
#unset_field - 
#path conn 
#open 2013-01-01-00-00-01 
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents 
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string]

Bro logs begin with a 8 line header that describes the log format. For parsing fields we should be concerned with line 7. Let’s print the 7th line from the connection log file as it contains the list of fields presented in the conn.log file. The first field, ts, is the timestamp in seconds since the unix epoch and the last field is tunnel_parents. Each field is described in the Bro documentation [2][3].

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | sed -n '7p' 
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents

Log Data:

Below is a sample of conn.log which contains the log header and the log data. Notice how the field names listed in the header correspond to the values listed in the log data. Begin with the third field, id_orig_h, which is the originater (or sender) of a connection, and see how each entry for that field is an IP address. Now look at the responder field which is field 6, it contains an IP address too. This IP address responded to the originator’s packet(s). In a client/server connection, in most cases, the originator is the client and the responder is the server. Familiarize yourself with some of the common fields: ts, id.orig_h, id.orig_p, id.resp_h, id_resp_p, service, and duration.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | head 
#separator \x09 
#set_separator , 
#empty_field (empty) 
#unset_field - 
#path conn 
#open 2013-01-01-00-00-01 
#fields ts uid id.orig_h id.orig_p id.resp_h id.resp_p proto service duration orig_bytes resp_bytes conn_state local_orig missed_bytes history orig_pkts orig_ip_bytes resp_pkts resp_ip_bytes tunnel_parents 
#types time string addr port addr port enum string interval count count string bool count string count count count count table[string] 
1357016390.095574 qfuVcja4nb9 43.45.3.9 46137 93.191.121.39 53 udp dns 0.154661 55 96 SF F 0 Dd 1 83 124 (empty) 
1357016390.255919 mW8KwF7YXSb 43.45.3.9 22314 204.212.170.189 53 udp dns 0.083968 75 80 SF F 0 Dd 1 103 108 (empty)

To avoid printing the header each time the first 8 lines can be skipped. I show three ways of doing this below.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | awk 'NR < 9 { next } { print }'
$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | grep -v ^#
$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut

Each results in this:

1357016390.095574 qfuVcja4nb9 43.45.3.9 46137 93.191.121.39 53 udp dns 0.154661 55 96 SF F 0 Dd 1 83 1 124 (empty) 
1357016390.255919 mW8KwF7YXSb 43.45.3.9 22314 204.212.170.189 53 udp dns 0.083968 75 80 SF F 0 Dd 1 103 1 108 (empty) 
...cut for brevity...

Field Separator:

The fields in Bro logs are separated by whitespace, to be precise, each field is separated by a tab ( \t ) character. This is specified in the log format header block as a hex value.

$ gzcat 2013-03-11/conn.01\:00\:00-00\:00\:00.log.gz | head -1 
#separator \x09

By converting the tab character to hex. we can see that it matches the separator value of 09:

$ echo -ne '\t' | hexdump -C 
00000000 09 |.| 
00000001

And thus we can see that the fields are in fact separated by the tab character (\t):

$ gzcat 2013-06-10/conn.00\:00\:00-00\:00\:00.log.gz | head | tail -1 | hexdump -c 
0000000 1 3 7 0 8 3 6 7 9 2 . 5 1 4 1 3 
0000010 8 \t H I j u 1 0 k z J a a \t 4 3 
0000020 8 . 2 1 5 . 2 0 1 . 1 3 4 \t 7 5 
0000030 7 1 \t 2 0 4 . 2 4 6 . 1 6 0 . 5 
0000040 \t 5 3 \t u d p \t d n s \t 0 . 0 7 
0000050 0 4 8 9 \t 9 9 \t 1 0 4 \t S F \t F 
0000060 \t 0 \t D d \t 1 \t 1 2 7 \t 1 \t 1 3 
0000070 2 \t ( e m p t y ) \n 
000007a

Note: It’s best to tell your tools to use a tab character as a separator because while the whitespace separator will work when parsing most of Bro’s logs there are some fields where values are separated by spaces such as in Notice messages where there’s often English sentences. Continuing with the Notice message example, without specifying the tab separator you would only print the first word in the message rather than the entire message.

A few examples of specifying the tab character as a field separator follow:

$ awk -F '\t' '{ print $3 }' current/conn.log
$ awk 'BEGIN { FS="\t" } { print $3 }' current/conn.log
$ sort -t $'\t' -k 6 current/conn.log
$ cut -d $'\t' -f 3 current/conn.log
$ grep '\tgoogle.com\t' current/conn.log

Bro-cut:

Bro-cut is a C program which allows one to avoid counting fields and instead print fields by their name. It can perform timestamp conversion from unix epoch time to the human readable local time format. Bro-cut also strips off the header by default. It should be noted, as we shall see later, that bro-cut needs to see the log header to operate on the log data.

Note: Bro-cut used to be a shell script wrapper for a large gawk program and as a consequence was very slow. With the rewrite, it’s exceptionally fast, it’s even faster than GNU cut.

Fields by Name:

In the following examples we will print the id.resp_p field which is the destination port field and happens to be field 6. Notice how awk prints the 6th field from the header too; it reads line by line. Where there isn’t a value in the 6th field awk prints the new line character which results in empty lines. e.g.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | awk '{ print $6 }' | head 
id.resp_h 
addr 
53 
53

Because bro-cut strips off the header, the output with and without, will differ by a few lines depending on which field is printed with awk.

Printing the id.resp_p field without bro-cut:

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | awk '{ print $6 }' | tail 
443 
443 
3283 
9997 
500 
25 
587 
3283 
3283

Printing the id.resp_p field with bro-cut (notice the difference in the first line: 53):

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut id.resp_p | tail 
53 
443 
443 
3283 
9997 
500 
25 
587 
3283 
3283

Also, note that field counting becomes more inconvenient as the field number moves farther from 1. Thus, the use of bro-cut.

Print Format Header Block:

Passing the (-c) option to bro-cut will cause it to print, rather than omit, the format header block.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut -c id.resp_p | head 
#separator \x09 
#set_separator , 
#empty_field (empty) 
#unset_field - 
#path conn 
#open 2013-01-01-00-00-01 
#fields id.resp_p 
#types port 
53 
53

Timestamps:

Examine the first field, ts, for the next two examples.

Print one connection record in the default unix epoch time format.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut | head -1 
1357016390.095574 qfuVcja4nb9 43.45.3.9 46137 93.191.121.39 53 udp dns 0.154661 55 96 SF F 0 Dd 1 83 124 (empty)

Print and convert the time to human readable format with the (-d) option.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut -d | head -1 
2012-12-31T22:59:50-0600 qfuVcja4nb9 43.45.3.9 46137 93.191.121.39 53 udp dns 0.154661 55 96 SF F 0 Dd 1 83 1 124 (empty)

Query Examples:

Basic Queries:

Print and convert the timestamp field to local time from the first and last connection in the connection log file.This gives us the date and time range for the log as all other entries occur between these two points.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut -d ts | head -1 
2012-12-31T22:59:50-0600 
$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut -d ts | tail -1 
2013-01-01T22:59:54-0600

A few examples of matching a single IP address will follow using the id.orig_h or id.resp_h fields.

In this example we use BSD zgrep, the \t represents whitespace, a tab to be precise, since we want to match on columns only (they are surrounded by tabs). We also escape the period with backslashes so they loose there special regular expression meaning (by default, they match any single character).

$ zgrep '\t10\.1\.1\.200\t' 2013-03-*/conn*

Using GNU zgrep, like you would find on Linux, we could use character classes instead to represent any type of whitespace, not just tabs. This will not offer any advantage over matching tabs when working with Bro logs but it’s useful to know.

$ zgrep '[:space:]10\.1\.1\.200[:space:]' 2013-03-*/conn*

The next example will get what you want but possibly with things that you don’t want.e.g. it could match 12121212 or 1a1b1c1d or 121.121 because the periods match any single character unless they are escaped by placing a backslash before it. Also, since we’re not matching whitespace at the beginning or end other characters can be there. e.g. 51.17.1.1 or 1.121.142.9 or 51.16.17.100. Similar to searching for pattern ‘break’ and matching ‘This is a breakthrough!’

$ zgrep '1.1.1.1'2013-03-*/conn*

Print a list of all the unique (removal of duplicates) services detected by Bro. Sorting is required before uniq will remove duplicates. The ‘-‘ in the output below is the default value Bro places in a column when it doesn’t have additional data. Since we are looking at the service field Bro prints the ‘*-‘*when it cannot determine the service. This value is defined as the unset_field which is displayed in the log header.

$ gzcat 2013-01-01/conn.00\:00\:00-00\:00\:00.log.gz | bro-cut service | sort | uniq 
- 
dns 
ftp 
ftp-data 
http 
smtp 
ssh 
ssl 
teredo

Find all the connections originating from a single host and print the unique destination ports. This is a great query to run for audits and to find ports that machines should not be connecting to, e.g. Why is user X making a telnet connection (port 23)?

$ gzcat 2013-01-08/conn*.gz | awk -F '\t' '$3 == "43.45.3.65" { print $6 }' | sort -n | uniq 
53 
80 
88 
123 
137 
443 
514 
548 
1900 
5000 
5001 
5002 
5353 
5900 
50004 
51259

Print all TCP connections that have a duration greater than 1 hour (3600 seconds).

$ gzcat 2013-01-01/conn* | bro-cut -c | awk -F '\t' '$7 == "tcp" && $9 > 3600' 
1357001043.300133 HtmnK6zFIQj 10.1.1.4 49283 17.172.34.29 993 tcp - 15543.793548 5235 6314 OTH T 0 DadA 324 1819324 19274 (empty) 
1357050569.479969 meG0ovRaqJb 10.1.1.53 49164 43.45.3.9 548 tcp - 22377.502507 66205567 39368009 SF T 0 ShADadfrFr 323245 83014319 318047 55906417 (empty) 
1357063808.108982 j2MDcPoV0ja 10.1.1.53 49294 1.1.1.56 22 tcp ssh 8840.091311 151413 785021 SF T 0 ShAdDaFf 7891467077 4815 977625 (empty)

Sort machines by the number of failed SSH authentication attempts by digging through ssh.log. This is an easy way to compile data on password cracking attacks.

$ gzcat 2013-*/ssh* | awk -F '\t' '$7 == "failure" { print $3 }' | sort -t . -n -k 1,1 -k 2,2 -k 3,3 -k 4,4 | uniq -c | sort -rn | head -5 
1239 90.124.70.62 
598 10.1.1.16 
367 10.1.1.161 
277 10.1.1.48 
277 10.1.1.172

Look for suspicious or outdated SSH clients. We found machines using the insecure SSH-1 protocol and machines that are scanning via Nmap’s SSH LUA scripts.

$ gzcat 2013-*/ssh* | bro-cut client | sort | uniq 
SSH-1.33-OpenSSH_5.0 
SSH-1.5-Nmap-SSH1-Hostkey 
SSH-1.5-NmapNSE_1.0 
SSH-1.5-OpenSSH_5.0 
SSH-1.99-OpenSSH_5.0 
SSH-2.0-Nmap-SSH2-Hostkey ...

Find HTTP servers running on non-standard ports for all logs in the month of January 2013.

$ gzcat 2013-01-*/http*.gz | bro-cut id.resp_p | sort -n | uniq 
80 
443 
1947 
2869 
3689 
8088 
8770 
16000 
30000 
49152

Let’s investigate the URL’s for one of the odd ports from the example above.

$ gzcat 2013-01-*/http*.gz | awk -F '\t' '$6 == "16000" { print $9$10 }' 
srv1.mycompany.com/admin-client-lib/admin-console-client.jar 
srv1.mycompany.com/admin-client-lib/ulc-base-trusted.jar 
srv1.mycompany.com/admin-client-lib/ulc-jnlp-client.jar 
srv1.mycompany.com/admin-client-lib/ulc-servlet-client.jar

Print and tally the state field from all connections originating from host 10.1.1.89. This machine has a high count of S0 flags which indicates no response has been seen to the originator. This machine is port scanning.

$ zcat conn* | awk -F '\t' '$3 == "10.1.1.89" && $6 == "80" { print $12 }' | sort -S 1G | uniq -c 
2629 OTH 
780252 REJ 
109367 RSTR 
33394 RSTRH 
36400262 S0 
711808 S1 
117 S3 
51 SHR

Print query name for DNS requests that resulted in a non-existent domain. These results are from SPAM lists who’s domain name changes for each new list revision.

$ gzcat 2013-02-0[1-3]/dns*.gz | awk -F '\t' '/^#/ || $15 == "NXDOMAIN"' | bro-cut query | head 
34.11.110.8.plus.bondedsender.org 
34.11.110.8.sa-accredit.habeas.com 
34.11.110.8.sa-trusted.bondedsender.org 
80.62.96.76.sa-other.bondedsender.org 
51.62.96.76.sa-other.bondedsender.org 
70.39.41.66.sa-other.bondedsender.org 
34.11.110.8.plus.bondedsender.org 
34.11.110.8.sa-accredit.habeas.com 
34.11.110.8.sa-trusted.bondedsender.org 
104.251.240.72.sa-other.bondedsender.org

Match originator address and responder port and print the number of packets sent and received.

$ zcat 2013-08-2[3-8]/conn.* | awk -F '\t' '/^#/ || $3 == "192.168.1.57" && $6 == "514"' | bro-cut -d orig_pkts resp_pkts 
2 0

You could also do this to make it more readable:

$ zcat 2013-08-2[7-8]/conn.* | awk -F '\t' '$3 == "192.168.1.57" && $6 == "514" { print "orig_pkts:", $16, "resp_pkts:", $18 }' 
orig_pkts: 2 resp_pkts: 0

Complex Queries:

The next awk script calculates the total number of payload bytes seen by originators across all connections in the log file. The result gives us the ability to find the top talker by bytes.

$ gzcat 2013-05-01/conn.14\:24\:38-00\:00\:00.log.gz | awk -F '\t' '{ bytes[$3] += $10 } END { for(x in bytes) print x":", bytes[x] }'

Now, let’s sort by bytes and print the top 10.

$ gzcat 2013-05-01/conn.14\:24\:38-00\:00\:00.log.gz | awk -F '\t' '{ bytes[$3] += $10 } END { for(x in bytes) print x":", bytes[x] }' | sort -t : -nr -k 2 | head 
190.113.134.90: 1944745071859 
192.168.1.4: 161965756204 161.69.46.142: 69821291682 
69.41.183.67: 51735012733 
204.12.219.143: 41313370591 
151.237.176.72: 37506500381 
208.177.76.90: 32860218655 184.22.1.232: 28925220963 
94.228.209.158: 26808968271 
151.237.176.73: 25723974940

To do the same but for the number of packets replace $10 with $16.

The following query appends each unique destination port seen from a single source IP address on a single line.

awk-F '\t' '{ if ( NR < 9 ) next ; if ( a[$3] && a[$3] !~ $6 ) a[$3] = a[$3]","$6; else a[$3] = $6 } END {for(x in a)print x":"a[x]}' conn.log

Tips and Tricks:

View Logs in Realtime:

To view the current spooling logs in real time use tail with ( -f ) option which will keep the file descriptor open.

$ tail -f current/dns.log

You can pipe the output to other tools just as before to narrow in on patterns.

$ tail -f current/dns.log | grep '\tyoutube\.com\t'

Convert Timestamp with Gawk:

Manual time conversion with gawk. Using the old version of bro-cut to convert timestamps can take a very long time when working with hundreds of thousands or millions of records. Sometimes omitting bro-cut can decrease the time of a completing a log query by 6 hours. Instead, you can make your query and write out its contents to a text file and then run the following script on the final results which contains fewer records than the original log file. We do this because after you run an awk or other tool you can’t pipe the data into bro-cut again unless you tell the tool to keep the log header because bro-cut requires the log header.

$ gawk -F '\t' '{ if ($1 ~ /^1375/) { time=strftime("%Y-%m-%dT%H:%M:%S%z", $1); $1=time; } print $0 }' input.log > ts.output.log

Faster Queries with Parallelization:

If you have large log files to sift through it can take a lot of time. This amount of time can be reduced by parallelizing the work over multiple CPU cores. The number of cores can be specified ( -j ) otherwise GNU Parallels will use as many as are available.

Gnu Parallels arranges output in order of input so you don’t have to sort afterwords. In other words,the order in the examples that follow is determined by ls, the default timestamped Bro logs will not need to be sorted again.

Note: I’ve noticed different outcomes between using all the cores and using a lesser number such as 4 ( -j 4 ). This is a bug in GNU Parellels, so be sure to check your results ( wc -l ) to find the accurate spot i.e. Do a sweeping run zcat’ing all your logs and pipe them to wc. Do the same thing except with GNUParallels to see if the results are the same. If not, try lowering the number of cores GNU Parallels uses ( -j ) and repeat until the correct number matches what is returned when using Parallels.

The next example pipes the filenames to Parallels so each will be decompressed and queried via Awk on a different CPU.

$ ls conn* | parallel "zcat {} | awk '$3 || $6 == "192.168.1.10"'"

Files can be specified and globbed without the pipe as well after the separator ( ::: ).

$ parallel -j8 'zcat {} | wc -l' ::: /usr/local/bro/logs/2013/05/2013-05-14/conn.*.gz 
1263528 
1252168 
1250748 
1345403 
1333533 
1389453 
1342875 
1288211

Be wary of proper shell escapes when using queries with multiple single and double-quotes.

$ ls conn* | parallel 'zcat {} | awk '$3 == "192.168.1.10" || $6 == "192.168.1.10"'
$ ls */*/*/conn*.gz | parallel -j8 'zcat {} | grep '[[:space:]]10\.1\.1\.2[[:space:]]'' > 10.1.1.2-all.txt

Complementary Tools

More Reading & Other Resources