In my last post I introduced the concept of using white listing in order to review firewall logs. I discussed how this process can both simplify as well as expedite the log review process, by automating much of the up front work. In this post we will look at some actual examples, as well as start creating a firewall log parsing script.
The basics of grep
In order to show you the process of white listing your firewall logs, I am going to use grep. Grep is a standard Linux/UNIX tool, with free versions available for Windows (grab both the Binaries as well as the Dependencies). Grep is certainly not the most efficient tool for the job, but it is by far the simplest to learn. If you are a Perl, PHP, AWK&SED, SQL, etc. guru, by all means stick with your tool of choice. Simply mimic the process I’ve defined here using your appropriate command set.
Grep is a pattern-matching tool. It allows you to search one or more files looking for a specific pattern. When the pattern is found, the entire line is printed out. So for example the command:
grep 192.168.1.10 firewall.log
would produce all lines in the file “firewall.log” that contains the IP address “192.168.1.10”. Grep has a number of supported switches, but the only one we need for firewall log review is the “-v” switch. This switch tells grep to match all lines that DO NOT contain the specified pattern. So for example:
grep –v 192.168.1.10 firewall.log
would only print out lines that do not contain the specified IP address.
With grep, a period is actually a wild card character. So while I said the first grep command would match on the IP address 192.168.1.10, it could actually match on more than that. Grep interprets the string to read:
Match on: 192 <any single character> 168 <any single character> 1 <any single character> 10
If we want grep to match periods as periods, we have to preced them with a backslash character. So the proper syntax would actually be:
grep 192\.168\.1\.10 firewall.log
Finally, sometimes we want to match on multiple patterns that are strung together. For example what if we are only interested in 192.168.1.10 traffic when that’s the destination IP address? Depending on your firewall log format, the command may look something like this on a Linux or UNIX system:
grep ‘dst 192\.168\.1\.10’ firewall.log
On Windows, the command would look like this:
grep “dst 192\.168\.1\.10” firewall.log
Note the only difference is that Linux and UNIX uses single quotes, while Windows uses double quotes.
Logical AND’s and OR’s
Sometimes we need to match on multiple patterns within the same line. For example what if we only wish to see TCP/80 traffic to our Web server? In this case there are actually two patterns we wish to match on the same line. The problem is there may be other stuff in the middle we don’t care about.
To perform a logical AND, simply use the grep command twice on the same line. For example:
grep “dst 192\.168\.1\.10” firewall.log | grep “dst_port 80 ”
The pipe symbol simply lets you run the second grep command before execution ends. So the first grep command will grab all traffic going to 192.168.1.10 and then pass it to the second grep command. The second grep command then searches this output for all traffic headed to port 80. Look closely after the port number and you will see I included a space character. Without a space character, we could potentially match on port 800, 8080, etc.
Sometimes we may wish to match on either of two values. For example what if we wanted to see both HTTP and HTTPS traffic to our Web server? In this case we would need to do a logical AND combined with a logical OR. Here’s how to do that with grep:
grep ‘dst 192\.168\.1\.10’ firewall.log | grep ‘dst_port \(80 \|443 \)’
The first half of the command should look familiar, but the second half needs some explaining. We need to tell grep that the parenthesis characters are actual commands and not part of the string we wish to match. We do this by preceding them with a backslash character. The pipe character is what tells grep to process this command as a logical OR. Note the pipe also needs to be preceded by a backslash.
Sorting logs with grep
OK, so we have the basics, now let’s start applying them to reviewing a firewall log file. The first thing you need to do is get the log file into ASCII format. This is the native format for many firewalls, so no conversion may be required. If the log uses a proprietary format, the vendor usually supplies a tool to do the conversion. Personally I just send the logs to a SIM (http://www.chrisbrenton.org/2009/08/setting-up-a-security-information-management-system-sim-%E2%80%93-part-1/). From there you can simply copy them off to a working directory.
Next we need to open a text editor. While we can run our grep commands on the command line to test their accuracy, we want to place the commands in a shell script or batch file so they can be easily run later. For the rest of this post I will use single quotes, which is the syntax for both Linux and UNIX. Remember that your Windows version of grep may want to see double quotes instead.
The next step is to review the log file looking for traffic patterns you recognize. Let’s take an easy one like HTTP traffic to your Web server. Look closely at a log entry and identify what unique characteristics it has which tells you its HTTP traffic to your Web server. Most likely this will be the target IP address of your Web server, as well as a target port of TCP/80. Now simply create a grep command to copy these entries to a new file:
grep ‘dst 192\.168\.1\.10’ firewall.log | grep ‘dst_port 80 ’ > web_server_http.txt
Note that rather than print the output to the screen, we redirected it to a file with a descriptive name. That way the log entries are available for later review.
If our Web server is only offering HTTP, we should only see traffic headed to port TCP/80. Any other port connect attempts can be considered suspect traffic, and may be part of a scan or probe. With port TCP/80 traffic in it’s own file, we now simply redirect everything else to another file:
grep ‘dst 192\.168\.1\.10’ firewall.log | grep –v ‘dst_port 80 ’ > web_server_scan.txt
This should account for all traffic headed to our Web server. Now we need to get all of this saved traffic out of the way so it will be easier to spot other entries. While we could try to delete them, it would be prudent to keep an unmodified version of the firewall log just in case we need to refer back to it. The easiest way to handle this conflict is to simply create a new temporary file.
grep –v ‘dst 192\.168\.1\.10’ firewall.log > temp1.txt
Now we simply open “temp1.txt” and look for the next pattern we recognize. Let’s say that’s inbound and outbound SMTP. That section of our script may look something like this:
grep ‘dst 192\.168\.1\.12’ temp1.txt | grep ‘dst_port 25 ’ > smtp_inbound.txt
grep ‘dst 192\.168\.1\.12’ temp1.txt | grep –v ‘dst_port 25 ’ > smtp_server_scan.txt
grep –v ‘dst 192\.168\.1\.12’ temp1.txt > temp2.txt
grep ‘src 192\.168\.1\.12’ temp2.txt | grep ‘dst_port 25 ’ > smtp_outbound.txt
grep ‘src 192\.168\.1\.12’ temp2.txt | grep –v ‘dst_port 25 ’ > smtp_server_compromise.txt
grep –v ‘src 192\.168\.1\.12’ temp2.txt > temp3.txt
SMTP is a bidirectional service, so the first three lines take care of inbound traffic, while the last three look at outbound. Note that in line five we expect to see the server only communicating out on TCP/25. Any other port attempts may indicate the system has been compromised and is now calling home. Obviously it would be a good idea to do the same for the Web server:
grep ‘src 192\.168\.1\.10’ temp3.txt > web_server_compromise.txt
grep –v ‘src 192\.168\.1\.10’ temp3.txt > temp4.txt
Closing out your script
Now simply repeat this process until you are left with a temp file that has log entries you don’t expect to see. It is now time to start closing out our script. First, rename your last temporary file to something that will catch your attention. On Windows the command would be:
ren temp23.txt interesting_stuff.txt
On Linux or UNIX the command would be:
mv temp23.txt interesting_stuff.txt
This interesting file will probably be the first file you are going to want to review, as it will contain all of the unexpected patterns. Now that all the normal traffic flow is out of the way, it should take substantially less time to spot anything you truly need to worry about.
One nice thing about the temp files is that they can aid in troubleshooting. For example if grep moves 1.5 MB of log entries into a new file, I should expect to see the next temp file shrink by 1.5 MB as well. If not, something is wrong in my script. Also, if you notice that all of your temp files after “temp12.txt” have a zero file length, chances are you have a syntax error just after you created “temp12.txt”.
Once your script is vetted and working however, you may not want to have the temp files in your working directory. That way it is easier to focus in on the sorted files during a review. When you reach this point, simply have the last line of your script delete the temp files. On Windows the syntax would be:
del /q temp*.txt
and on UNIX or Linux the command would be:
rm –f temp*.txt
Automating the process
Once you have a working script, it is now time to automate the process. If you are running Linux or UNIX, simply setup the script to run via cron. If you need help with configuring a cron job there are some excellent help pages. The equivalent on Windows is called a Scheduled Task, and Microsoft has some excellent help in the knowledgebase.
I mentioned in my last post that I like to look for error packets. I will typically do this right at the beginning of my script. Also, it is not uncommon for systems to call home in order to check for patches. I usually put these exceptions at the beginning of my script as well. Something like:
grep ‘dst 1\.2\.3\.[60-61] ‘ firewall.log > server_patching.txt
grep ‘dst 10\.20\.30\.[1-8] ‘ firewall.log >> server_patching.txt
The first command grabs all traffic headed to 188.8.131.52 or 184.108.40.206. The second looks for traffic headed to 10.20.30.1 – 10.20.30.8. Be very careful in specifying multiple systems! This is a great way to miss something interesting in your log files. You should only use this syntax to describe known patch server. Never specify all the subnets for a particular company. For example if you filter out all Microsoft IPs this way, than a single owned system on their network could control bots on your network and you would never know it.
This syntax is also handy when writing filters for outbound log entries. For example assume that we use 192.168.1.0 – 192.168.1.24 for internal addressing. To grab all outbound HTTP traffic we could say:
grep ‘src 192\.168\.[1-24]\.[0-255] ‘ temp15.txt | grep ‘dst_port 80 ’ > outbound_http.txt
As mentioned in the last post, we may wish to check outbound HTTP during non-business hours to see if we have any Malware calling home.
Don’t forget you can use grep as part of your review process as well. For example let’s say we are reviewing interesting_stuff.txt and spot a source IP (220.127.116.11) we know is hostile. We want to check the file to see if there are any other IP addresses we need to be concerned with. While we could keep paging through the file, a simpler solution would be to use grep:
grep –v 1\.2\.3\.4 interesting_stuff.txt
You review script has the potential to act as a mini-IDS system. By this I mean if you find a suspect pattern but figure out what’s going on, don’t be afraid to leverage your script to categorize this pattern in the future.
For example, let’s say we notice a number of systems on the Internet attempting to access port 5060, but we do not have this port open through our firewall. A quick Google search will tell us that this is SIP (http://en.wikipedia.org/wiki/Session_Initiation_Protocol) traffic. Why bother to look it up every time we see it however if our script can identify it for us? Now that we know 5060 is SIP, simply add the following lines to our script:
grep ‘dst_port 5060 ‘ temp.txt > sip_traffic.txt
We can do the same for any ports that are commonly being probed.
What makes firewall log review so time consuming is that you need to sift through all of the normal traffic patterns in order to find the log entries that identify a true security issue. By white listing all expected traffic patterns, it becomes far simpler to find and review any unexpected entries within your logs. By further automating the process of sorting the logs, log review can be reduced to a quick activity that can easily be performed on a daily basis.