Archive for the ‘3-err’ category

How To Review A Firewall Log In 15 Min Or Less – Part 1

September 25th, 2009

One of the most difficult and time consuming parts of maintaining a perimeter is reviewing firewall logs. It’s not uncommon for an organization to generate 50, 100, 500 MB or more worth of firewall log entries on a daily basis. The task is so daunting in fact, that many administrators choose to ignore their logs. In this series I’ll show you how to expedite the firewall log review process so that you can complete it faster than that morning cup of coffee.

Why firewall log review is important

I once took part in a panel discussion where one of my fellow SANS instructors announced to the crowd “the perimeter is dead and just short of useless”. I remember thinking I was glad I was not one of his students. I occasionally take on new clients and find that 7/10 times I can identify at least one compromised system they did not know about. In every case it has been the client’s own firewall logs that pointed me to the infected system.

In the old days firewall log review was all about checking your inbound drop entries to look for port scans. Today the focus is on outbound traffic. Specifically, you should be checking permitted patterns. With the plethora of non-signature Malware today it has become far too easy for an attacker to get malicious code onto a system. A properly configured perimeter will show you when a compromised system tries to call home. This is typically your best chance to identify when a system has become compromised.

What needs to be logged?

Dropped traffic does not have to be logged provided you are not blind to DoS flood attacks. For example if you are running a tool such as NTOP on your perimeter, collecting RMON or Netflow data, than it is OK not to log dropped packets as you can collect this information through other means.

When traffic is permitted across the perimeter however, you need to log it. This includes all permitted traffic, regardless of direction (egress as well as ingress). At a minimum we want to see header information for the first packet in a session. Anything beyond that can be considered a bonus.

Some kernel level rootkits do an excellent job of hiding themselves within the infected system. In fact many are so stealthy they cannot be detected by checking the system directly. One possible option is to pull the hard drive and check it from a known to be clean system. Obviously this is highly impractical whenever you have more than just a couple of systems.

A better option is to check the network for tell tale signs of the Malware calling home. Malware typically creates outbound sessions either to transfer a toolkit or check in for marching orders. The firewall is in an optimal position to potentially block, or at the very least log, both of these activity patterns. So by reviewing our firewall logs, we can quickly check every system on our network for indications of a compromise.

Malware can leverage any socket to call home, but most use TCP/80 (HTTP) or TCP/443 (HTTPS). This is because Malware authors know most firewall administrators do not log these outbound sessions because they are responsible for the greatest portion of perimeter traffic. So again, if we are going to permit the traffic to pass our perimeter, we must insure we are logging it.

Log review as a process

The mistake I see most administrators make is they perform a time linear analysis of their log entries looking for “the interesting stuff”. The problem is suspect traffic can be extremely difficult to detect this way as it will be mixed in with normal traffic flow. So the first thing we need to do is get the normal traffic out of the way.

Think of the rectangle in Figure #1 as representing your firewall log. Assume it contains a mixture of normal as well as suspect traffic patterns. Rather than immediately looking for the suspect patterns, let’s first get the normal patterns out of the way. For example HTTP headed to our Web server from the Internet is an expected pattern. If we pull all of these entries out of the log file, the log file becomes a little bit smaller. Inbound and outbound SMTP to our mail server is another expected pattern. Again, if we can remove these entries as well the firewall log file becomes even smaller.

firewall-log-review-process

Now we simply continue this process for every traffic pattern we expect to see crossing our perimeter. The more traffic patterns we recognize and move out of the way, the smaller the final log file becomes. What’s left is just the unexpected traffic patterns that require review time from a firewall administrator. I’ve seen sites that typically generate 250-300 MB worth of logs daily end up with a final file less that 100 KB in size. Needless to say 100 KB takes far less time to review that 300 MB.

Automate, automate, automate

If this seems like a lot of work, it only will be initially. What I do is create a batch file, shell script, or set of database queries to automate the process of parsing the firewall log. We can then run this process as a CRON job or scheduled task. This means that all of the hard work (breaking up the main log file into smaller files) can be done off hours. When you walk in the door in the morning, the log file will already be segregated. You can then immediately focus in on the suspect patterns.

Helpful tips

Here are some tips I’ve developed over the years:

  • There is no “single right way” to segregate log entries. It is all about how you personally spot unsuspected patterns. You can sort by IP address, port number, or whatever info you have to work with in your logs.
  • This is not about obsessively putting one log entry into every sort file. This process is about creating easier to spot patterns. For example a TCP reset in an HTTP stream could go in both an “error” file and an “HTTP” file. Each would make it easier to spot different types of patterns.
  • Start by pulling our error packets (TCP resets, ICMP type 3’s & 11’s). They always indicate something is broke or someone did something unexpected.
  • A smart attacker will never make your “top 5 communicators” list. I’ve seen infected systems make as few as four outbound connections in a day.
  • Make a note of the average size of each of your sort files. A sharp spike in traffic may warrant further investigation.
  • Sometimes it is helpful to parse the same pattern into two different files. For example I create an “outbound HTTP” file, and then parse out all of the traffic generate during non-business hours. This makes it much easier to find infected systems calling home.
  • Whitelist know patch sites. For example systems may call home all night long to Microsoft and Adobe to check for updated patches. If you can parse out these entries, you’ll end up with far less noise in your final file.
  • Some sites find it helpful to parse out users checking their personal email. This can be helpful information if data leakage occurs.
  • I like to segregate traffic based on security zone. For example I would be far less concerned about SSH from the internal network to the DMZ, than I would about SSH headed to the Internet. If you are not sure why, read this.
  • In an ideal world, ever traffic pattern you find will be described in your organization’s network usage policy. If its not, then further investigation may be required.
  • Expect to tweak your script over time, as networks are an evolving entity.

Exec Summary

White listing expected traffic patterns in your firewall log can help to expedite the log review process. Similar traffic becomes grouped together, and can be more easily checked for suspect patterns. In part 2 of this series I’ll walk you through the process of creating your own script using a number of different firewall products.

Filtering Packet Output With Tshark

September 1st, 2009

In my last post we were looking at some packet decodes. One of the biggest pains in working with decodes is following a specific field over multiple packets. Some fields are not too bad as they tend to get printed by default. Good examples are port numbers and TCP flag settings. But in the last post we were working with the IP ID. That usually only gets printed if you view verbose information, so now you end up with a lot of data on the screen that you do not care about.

Luckily, the new version of tshark fixes this problem. Let’s work with some example so you can see what I mean.

What is tshark?

Tshark is the command line packet utility that is included with Wireshark. If you are not familiar with Wireshark, it is arguably one of the best graphical packet decoding tools available today, and it is free for use. There are versions that run on Linux, BSD and Windows (YES! Even Windows!).

So why work at the command line if there is a GUI version? In the case of tshark, it is because you can get a better presentation layout of the data you actually want to see.

The problem

So let’s say we want to monitor the IP ID increment for multiple packets leaving a system. We may try a command similar to the following:

C:\>tshark -n -i 3 src host 192.168.100.5

Capturing on Intel(R) PRO/100 VE Network Connection (Microsoft’s Packet Scheduler)

0.000000 192.168.100.5 -> 192.168.100.3 TCP 0 > 1832 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

1.003331 192.168.100.5 -> 192.168.100.3 TCP 0 > 1833 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

2.007332 192.168.100.5 -> 192.168.100.3 TCP 0 > 1834 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

3.011347 192.168.100.5 -> 192.168.100.3 TCP 0 > 1835 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

4.015336 192.168.100.5 -> 192.168.100.3 TCP 0 > 1836 [RST, ACK] Seq=1 Ack=1 Win=0 Len=0

The above command tells tshark not to perform name resolution, listen on the system’s third interface, and only capture packets originating from the system at 192.168.100.5. There is a problem however, where’s the IP ID? Unfortunately, tshark, like tcpdump and windump, does not print the IP ID value by default. Even Wireshark buries it in the middle pane so it can be difficult to spot and follow over multiple packets.

So what to do? Normally we would simply add in the “verbose” switch. This is what I used in the last post to print the IP ID with tcpdump. With tcpdump and windump you use the switch “-v”, but with tshark the switch is capitalized “-V”. Here’s an example:

C:\>tshark -n -i 3 -V src host 192.168.100.5

Capturing on Intel(R) PRO/100 VE Network Connection (Microsoft’s Packet Scheduler)

Frame 1 (54 bytes on wire, 54 bytes captured)

Arrival Time: Aug 31, 2009 13:18:42.532218000

[Time delta from previous captured frame: 0.000000000 seconds]

[Time delta from previous displayed frame: 0.000000000 seconds]

[Time since reference or first frame: 0.000000000 seconds]

Frame Number: 1

Frame Length: 54 bytes

Capture Length: 54 bytes

[Frame is marked: False]

[Protocols in frame: eth:ip:tcp]

Ethernet II, Src: 00:07:e9:46:2d:55 (00:07:e9:46:2d:55), Dst: 00:17:08:54:9a:00 (00:17:08:54:9a:00)

Destination: 00:17:08:54:9a:00 (00:17:08:54:9a:00)

Address: 00:17:08:54:9a:00 (00:17:08:54:9a:00)

…. …0 …. …. …. …. = IG bit: Individual address (unicast)

…. ..0. …. …. …. …. = LG bit: Globally unique address (factory default)

Source: 00:07:e9:46:2d:55 (00:07:e9:46:2d:55)

Address: 00:07:e9:46:2d:55 (00:07:e9:46:2d:55)

…. …0 …. …. …. …. = IG bit: Individual address (unicast)

…. ..0. …. …. …. …. = LG bit: Globally unique address (factory default)

Type: IP (0×0800)

Internet Protocol, Src: 192.168.100.5 (192.168.100.5), Dst: 192.168.100.3 (192.168.100.3)

Version: 4

Header length: 20 bytes

Differentiated Services Field: 0×00 (DSCP 0×00: Default; ECN: 0×00)

0000 00.. = Differentiated Services Codepoint: Default (0×00)

…. ..0. = ECN-Capable Transport (ECT): 0

…. …0 = ECN-CE: 0

Total Length: 40

Identification: 0×0461 (1121)

Flags: 0×00

0… = Reserved bit: Not set

.0.. = Don’t fragment: Not set

..0. = More fragments: Not set

Fragment offset: 0

Time to live: 128

Protocol: TCP (0×06)

Header checksum: 0xed15 [correct]

[Good: True]

[Bad : False]

Source: 192.168.100.5 (192.168.100.5)

Destination: 192.168.100.3 (192.168.100.3)

Transmission Control Protocol, Src Port: 0 (0), Dst Port: 2535 (2535), Seq: 1, Ack: 1, Len: 0

Source port: 0 (0)

Destination port: 2535 (2535)

[Stream index: 0]

Sequence number: 1    (relative sequence number)

Acknowledgement number: 1    (relative ack number)

Header length: 20 bytes

Flags: 0×14 (RST, ACK)

0… …. = Congestion Window Reduced (CWR): Not set

.0.. …. = ECN-Echo: Not set

..0. …. = Urgent: Not set

…1 …. = Acknowledgement: Set

…. 0… = Push: Not set

…. .1.. = Reset: Set

[Expert Info (Chat/Sequence): Connection reset (RST)]

[Message: Connection reset (RST)]

[Severity level: Chat]

[Group: Sequence]

…. ..0. = Syn: Not set

…. …0 = Fin: Not set

Window size: 0

Checksum: 0×9078 [validation disabled]

[Good Checksum: False]

[Bad Checksum: False]

OK, if we sift through the output we can find the IP ID information. If we want to check the value over multiple packets however, its going to be time consuming.

Printing only specific fields

One of the features in the latest version of tshark is the ability to print only specific packet fields. This is performed using the “-T fields” switch. You then use the “-e” switch to specific the fields you wish to see, in the specific order you want them to be printed. Here’s an example:

C:\>tshark -n -i 3 -T fields -e ip.src -e ip.dst -e ip.proto -e ip.id src host 192.168.100.5

Capturing on Intel(R) PRO/100 VE Network Connection (Microsoft’s Packet Scheduler)

192.168.100.5   192.168.100.3   0×06    0×0491

192.168.100.5   192.168.100.3   0×06    0×0492

192.168.100.5   192.168.100.3   0×06    0×0495

192.168.100.5   192.168.100.3   0×06    0×0496

192.168.100.5   192.168.100.3   0×06    0×0497

The “-e” switch uses the same pcap syntax used by Wireshark for display filters.  Packetlife has an excellent cheat sheet showing commonly used values. From left to right, I specified Source IP, Destination IP, transport, followed by the IP ID value. Note that with this format, it is now trivial to follow the IP ID increment in the packet stream.

Exec Summary

When performing packet decodes, it is not uncommon to end up with far more information on the screen than you actually need. In fact this can make it difficult to find the values you are trying to focus in on. The addition of tshark’s new display options can make following information over multiple packets far easier than it is with similar tools.

Spoofing Your IP Address During A Port Scan – Part 1

August 28th, 2009

I love debunking myths, one of my favorites is “a port scanner must reveal his true source IP address”. In this series I’ll show you how to perform a port scan while hiding your source IP address from the host being scanned. I’ll also tell you how you can detect the technique when it is used against you.

Nmap’s decoy mode

An alternative to the technique I will describe is nmap’s decoy mode. With decoy mode you identify a number of bogus source IP addresses. From the target host, it looks like all of the bogus IP addresses, as well as the true source IP address, are all performing a port scan at the same time. The concept is the administrator under attack will have no way of knowing which IP address is in fact the true IP performing the scan.

This technique really does not mask the true source, as the source IP address is one of the IPs performing the scan. If you know what to look for, you can easily figure out which source IP is actually scanning you. So while this technique will work, it is not completely effective at hiding the source IP address.

What is an idle scan?

When we perform an idle scan, we do not actually directly detect open ports. Rather, we detect the effect an open port would have on a third party system. The technique is similar to how many viruses are detected in the human body. Rather than detecting the actual virus, we look for antibodies that get produced when the virus is present in the system. An idle scan detects open ports in much the same fashion.

Before we can dig too deeply into an idle scan, we need to look at some of the intricacies of IP.

Predictable header values

While the RFCs are designed to be specific enough that dissimilar operating systems will still be able to communicate via IP, they still leave quite a bit open to interpretation. For example, the RFCs specify that the maximum Time To Live (TTL) value that can be used is 255. They do not however specify what initial TTL value must be used; so different operating systems use different starting TTLs. The RFCs describe how Ping should work, but do not specify what should be in the payload of Echo-Request packets. Again, different vendors use different values. These nuances can permit you to identify the source operating system based variations in the packet contents. The technique is referred to as passive fingerprinting.

The IP identifier (IP ID) field in the IP header (bytes 4 and 5) is a similar situation.  RFC 791 specifies that the number must be unique on a per host, per session basis. For example let’s say I connect to a remote SSH server. Each IP ID in that session must be unique. If I close the session and then connect back later, it is RFC compliant if one or more IP ID values get used again. They don’t have to be, but if it does happen it is not a problem.

So the RFCs say the IP ID needs to be unique, but it does not really tie down how to go about generating the value. This has lead to different operating systems deploying different methodologies. For example Windows starts at an IP ID value of 1 and simply increments the value by +1 for every packet leaving the system. When the maximum value of 65,535 is reached, it starts back over at 1. BSD puts a random value into the IP ID field of each packet leaving the system. Linux is random for TCP packet (except initial responses which are always zero), +1 incremental for ICMP, and time based for UDP. Whew!

The one that is interesting for our purposes is Windows. The fact that each packet leaving the system gets a +1 IP ID makes the value extremely predictable. For example, consider the following output:

[root@fubar ~]# hping -r 192.168.100.2

HPING 192.168.100.2 (eth0 192.168.100.2): NO FLAGS are set, 40 headers + 0 data bytes

len=46 ip=192.168.100.2 ttl=128 id=108 sport=0 flags=RA seq=0 win=0 rtt=0.4 ms

len=46 ip=192.168.100.2 ttl=128 id=+1 sport=0 flags=RA seq=1 win=0 rtt=0.4 ms

len=46 ip=192.168.100.2 ttl=128 id=+1 sport=0 flags=RA seq=2 win=0 rtt=0.4 ms

len=46 ip=192.168.100.2 ttl=128 id=+2 sport=0 flags=RA seq=3 win=0 rtt=0.4 ms

len=46 ip=192.168.100.2 ttl=128 id=+1 sport=0 flags=RA seq=4 win=0 rtt=0.4 ms

len=46 ip=192.168.100.2 ttl=128 id=+1 sport=0 flags=RA seq=5 win=0 rtt=0.4 ms

hping is a packet crafting tool which allows you create your own IP packets. In the above output we are using the “-r” switch to have hping monitoring the IP ID increment of a remote system. We know it is a Windows system, because Windows always uses a starting TTL of 128. Now look at the “id=” values. In the first line of output hping always prints out the absolute IP ID value used by the system. In this case here the value is 108. Each subsequent line then prints out the delta change from the previous packet. So in the second line the actual IP ID was 109, which is “+1” from the previous value of 108. The next packet had an IP ID of 110, which is “+1” from the previous IP ID value of 109.

Look closely at the fourth line of output. Note the delta change was “+2”. Since Windows uses sequential IP IDs, this tells us a packet we didn’t get to see just left the Windows system. We don’t know where it was going, but that’s OK. What’s important is that we can identify when the Windows system transmits and how many packets it sends out. For example had that line read “+5”, we would know that the Windows system transmitted four other packets since responding to our last probe.

Detecting open ports

So how can we leverage the predictable IP ID value of Windows for evil? One possibility is to turn the Windows system into an open port sensor. Here’s how we do it:

  1. Monitor the current IP ID being used by a Windows system. We should check the value at regular intervals over a relatively short period of time. Say once second.
  2. Find a target system we wish to port scan.
  3. While spoofing the source IP address of the Windows system, send a SYN packet to the TCP port we wish to probe on the target.

The target system will send a response packet back to the Windows system.  This response will either be:

The RFCs state you should never respond to error packets, regardless of whether you consider them to be legitimate or not. So when the Windows box receives the TCP reset error packet from the target host, it quietly ignores and discards the packet.

Things get a bit more interesting when a SYN/ACK is received however. From the Windows system’s perspective, it is just hanging out minding it’s own business when some unknown system sends it a SYN/ACK packet (remember we spoofed the Windows system’s IP address in the probe packet). A SYN/ACK effectively means “Sure, you can connect to me on that TCP port, no problem”. Of course since the Windows system didn’t actually send the SYN packet, it has no idea what the remote target is talking about.

With this in mind the Windows system sends a TCP reset error packet back to the target host. When the reset packet is transmitted, the next available IP ID is used within the IP header. This missing IP ID would be detected if we are still monitoring the IP ID increment once per second. So to review:

  • Closed port on target = No packets leaving Windows system
  • Open port on target = Windows sends a TCP reset using up an IP ID

So by monitoring the IP ID increment, we can identify when an open port is discovered as only probes to open ports will cause the IP ID increment to change.

Caveats

You can’t use just any Windows system for this attack. The box must meet certain criteria:

  • Relatively quite system generating little traffic (like a home system)
  • No stateful filtering of TCP traffic

Of course go to any cable or DSL network at 2:00 AM local time and you can find hundreds of thousands of systems that meet these criteria. Remember that Windows systems love to arbitrarily broadcast, so you may wish to perform multiple check of each open port just to ensure the IP ID increment change was in fact due to an open port being probed.

Exec Summary

An idle scan lets you probe open ports on a remote target, while fooling the target into believing that some third party system is performing the scan. Open ports are detected by monitoring for irregularities in the IP ID increment of the Windows box.

In the next installment we’ll actually see what these packets look like on the wire as well as discuss how to detect an idle attack when it is used against you.

Network Mapping Through A Firewall – Part 2

August 25th, 2009

In my last post I discussed how to use ICMP time exceeded in transit errors to map a network perimeter. I also discussed how to prevent attackers from using this technique against your network. In this post I’ll discuss another network mapping technique using the record route IP header options.

Ipv4 header options

The IP header is normally 20 bytes in size but can grow larger if one or more options are enabled. IP options get added to the end of the IP header, as shown in Figure #1. There are a number of registered IP options. The ones most frequently implemented however are the ones defined in RFC 791. Most operating systems and hardware devices have implemented the IP option record route (option 7), which is a part of the RFC 791 specification.

IP-Header-options

Record Route

The record route option can produce similar data to traceroute, but has a completely different methodology for identifying intermediary hops. As I discussed in my last post, traceroute uses the receipt of ICMP time exceeded in transit errors to map all of the network hops between two points. This requires multiple packets to be transmitted, as the tool needs to increment the TTL value.

Record route does not vary the TTL, and only requires a single packet to record hops along a link. Since the option exists within the IP header, it can be leverage with any IP transport or application.

Here is example output of a record route session using Ping under Linux:

[root@fubar ~]# ping -c 1 -R 192.168.204.10

PING 192.168.204.10 (192.168.204.10) 56(124) bytes of data.

From 192.168.201.1: icmp_seq=1 Redirect Host(New nexthop: 192.168.202.2)

64 bytes from 192.168.204.10: icmp_seq=1 ttl=125 time=6.56 ms

NOP

RR:       192.168.201.10

192.168.201.1

192.168.202.1

192.168.203.1

192.168.204.10

192.168.204.1

192.168.203.2

192.168.202.2

192.168.201.10

— 192.168.204.10 ping statistics —

1 packets transmitted, 1 received, 0% packet loss, time 6ms

rtt min/avg/max/mdev = 6.564/6.564/6.564/0.000 ms

Note that by setting the record route option in Ping (the “-R” switch) we’ve recorded all the router hops out to the target system at 192.168.204.10, and back again. So we’ve effectively generated a map of the network between the two points.

Record route decode

Here is an example decode of a record route packet:

07:04:32.934999 IP (tos 0×0, ttl 64, id 0, offset 0, flags [DF], proto ICMP (1), length 124, options (NOP,RR 192.168.201.10, 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0 0.0.0.0)) 192.168.201.10 > 192.168.204.10: ICMP echo request, id 43604, seq 1, length 64

0×0000:  4f00 007c 0000 4000 4001 6858 c0a8 c90a  O..|..@.@.hX….

0×0010:  c0a8 cc0a 0107 2708 c0a8 c90a 0000 0000  ……’………

0×0020:  0000 0000 0000 0000 0000 0000 0000 0000  …………….

0×0030:  0000 0000 0000 0000 0000 0000 0800 5df6  …………..].

0×0040:  aa54 0001 4022 914a 2544 0e00 0809 0a0b  .T..@”.J%D……

0×0050:  0c0d 0e0f 1011 1213 1415 1617 1819 1a1b  …………….

0×0060:  1c1d 1e1f 2021 2223 2425 2627 2829 2a2b  …..!”#$%&’()*+

0×0070:  2c2d 2e2f 3031 3233 3435 3637               ,-./01234567

A couple of points in the above decode are worth noting. Normally the beginning of the IP header starts with a Hex value of 4500. This means:

  • 4 = IP version
  • 5 = 5 32-bit words, or (32/8) x 5 = 20 bytes, the size of the IP header
  • 00 = Type Of Service (TOS) field, no values set

The decode above starts with the Hex value “4f00”, which means the IP header is larger than a regular IP header. This is our first clue that at least one IP option is set. How big is the IP header? If we convert “f” in Hex to decimal we get 15. 15 32-bit words converts to 60 bytes, which is the largest possible size for an IP header.

Also, note the series of zeros at the end of the header. When a record route packet is transmitted, the sending system needs to reserve space for all of the IP addresses that must be included. Windows will ask you to identify this value up front. Linux and UNIX simply go for the maximum. It does not cause a problem if reserved space goes unused. The rest of the packet carries a normal Echo-Request payload.

Record route limitations

You may have noticed that the above decode only reserved space for 8 IP addresses. Since most systems on the Internet are about 15 hops away from each other, what happens when 8 is not enough? Remember we said 60 bytes is the maximum size for an IP header. If we remove the rest of the IP header fields, that leaves us enough room to store 9 IP addresses. The transmitting system always stores it’s IP address in the option field, since technically it is the first IP address to forward the packet. This leaves us enough room for 8 more IP addresses maximum. If the packet travels over more than 8 hops, the remaining routers will simply ignore the record route option.

Here’s an example of what I mean. This output was generated with the Ping utility under Windows. The “-r” switch identifies that the record route option should be set. The numeric value identifies how many hops to record.

C:\test>ping -r 8 -n 1 www.wikipedia.org

Pinging rr.pmtpa.wikimedia.org [208.80.152.2] with 32 bytes of data:

Reply from 208.80.152.2: bytes=32 time=702ms TTL=50

Route: 98.232.117.112 ->

68.88.131.63 ->

68.87.145.246 ->

68.87.145.245 ->

68.85.162.70 ->

68.86.90.65 ->

4.68.185.30 ->

4.69.132.90

Ping statistics for 208.80.152.2:

Packets: Sent = 1, Received = 1, Lost = 0 (0% loss),

Approximate round trip times in milli-seconds:

Minimum = 702ms, Maximum = 702ms, Average = 702ms

The Wikipedia Web site is actually 19 hops away from my current location. Record route is only capable of recording the first 8 hops along the way.

Do I need to be concerned with record route?

Since record route is only capable of recording 8 hops, and most of us are 15 hops away from each other, is it truly a valid security concern? The 15 hop rule is only true a majority of the time. If I attempt to record route to a network that uses the same ISP that I do, I’ll probably generate a full network map. Further, if an internal system becomes compromised, record route can easily be leveraged to map the network from the compromised host’s location.

So record route is not a common attack vector, but its certainly going to be one of the tools a smart attacker will leverage when possible.

Protecting against record route

Record route is one of those communication parameters that gets ignored by most commercial firewall vendors. By that I mean they include support for record route in their RFC compliant IP stack, but give you little ability to control it via policy enforcement. Open source firewalls tend to do a better job controlling record route, but I’ll get into that in part 3 of this series.

If your firewall, HIPS or HIDS gives you access to the signature language, you can usually write a signature to flag all packets with an IP header size larger than 20 bytes. This does not guarantee the packet is using record route, as it could also mean that some other IP option is being used. To be frank however, all of the IP options can be leveraged for evil. Every one of them should be blocked, or at the very least detected, at the perimeter. I’ll cover more about IP options in a later post.

Exec Summary

Record route can produce a network map similar to the traceroute tool, but is limited to only recording 8 hops. While this limits its usefulness to an attacker, its entirely possible to run a record route session close enough to the target network to enumerate valuable data. Most firewalls do not give you the ability to control record route traffic, but you may be able to control/detect it with a signature based device.

Network Mapping Through A Firewall – Part 1

August 24th, 2009

When we create a set of firewall rules, one of our objectives is usually to stop attackers on the Internet from being able to map the internal network sitting behind the firewall. In this write up I’ll discuss two different techniques which will let an attacker punch right though most firewall setups, and what additional steps must be taken to prevent them.

The two techniques we will cover are:

  • Eliciting time exceeded in transit errors
  • IP header record route options

Understanding Time exceeded in transit errors

When a router receives a packet traveling from one network to another, it is required to decrement the TTL value by one. So if the packet currently has a TTL of 120, the router would change the value to 119 as it passes the packet along the network. The TTL field is byte 8 within the IP header and is shown in Figure #1.

IP-Header

If a router receives a packet with a TTL value of 1, it is not allowed to decrement the value to 0. Rather, the router generates an ICMP type 11, code 0 packet; referred to as an ICMP time exceeded in transit (TimeX) error. The TimeX error is then sent to the source IP address listed in the packet that had a TTL value of 1. Here’s an example TimeX packet. Note that 28 bytes of the original packet that caused the TimeX to be generated is embedded in the payload. The TTL value of this embedded header is 1.

10:14:19.947925 IP (tos 0xc0, ttl 63, id 26344, offset 0, flags [none], proto ICMP (1), length 88) 192.168.202.2 > 192.168.201.10: ICMP time exceeded in-transit, length 68
IP (tos 0×0, ttl 1, id 34730, offset 0, flags [none], proto ICMP (1), length 60) 192.168.201.10 > 192.168.204.10: ICMP echo request, id 18212, seq 1, length 40

One interesting point here is that RFC 792 defines that packets should be dropped when the TTL reaches 0, not 1. I’m unaware of any router or system that actually follows the RFCs. Every device I’ve seen drops the packet when the TTL is 1. You will however find many incorrect documents that describe this process quoting the RFCs rather than reality.

Network mapping with TimeX

Most network administrators are familiar with the traceroute and LFT tools under Linux and UNIX, and tracert and pathping under Windows. Each tool will identify all of the router hops from a source system to a specified target. This is accomplished by transmitting multiple packets and incrementing the TTL value.

Each of the above mentioned tools use TimeX errors to map all of the routers between two hosts. An example is shown in Figure #2. The tool would start by transmitted packets with an initial TTL value of 1. This causes the first router to return a TimeX error. The tool then looks at the source IP address of the TimeX error, and records this as the first hop along the link.

tracing

Packets with a TTL of 2 are then transmitted. When they pass through the first router, the TTL is decremented to 1. This causes the second router to generate a TimeX error. Again, we simply record the source IP address of the TimeX error as the second hop along the link. When an initial TTL value of 3 is transmitted, the third router generates the TimeX error. This continues until we eventually reach the target system. We’ve now efficiently mapped the IP addresses of all of the routers between the source and target system.

Here’s an example of what the output might look like:

[root@fubar ~]# traceroute -I -q 1 -N 1 10.1.4.10
traceroute to 10.1.4.10 (10.1.4.10), 30 hops max, 60 byte packets
1 10.1.1.1 (10.1.1.1) 0.270 ms
2 10.1.2.1 (10.1.2.1) 0.395 ms
3 10.1.3.1 (10.1.3.1) 0.589 ms
4 10.1.4.10 (10.1.4.10) 0.707 ms

Mapping through a firewall with time exceeded packets

The tools tracert and traceroute are easily defeated by a firewall. This is because tracert transmits Echo-Request packets which most environments block at the border. traceroute will also transmit Echo-Requests if the “-I” switch is used, but by default it targets UDP ports above 33,000. Again, most firewalls block this by default so the tool is easily defeated.

But what if an attacker targets an open port on the firewall? In other words, what if they transmit TCP/80 packets to your Web server, but vary the TTL values in a similar fashion to traceroute? This is exactly how the tool tcptraceroute operates. There is even a version available for Windows. Usually, tools like this can map right though a firewall.

For example, we have a Web server at 192.168.204.10 with a firewall sitting in front of it. The firewall has the standard “only let in TCP/80 to the Web server” policy set. Here is what traceroute reports:

[root@fubar ~]# traceroute -q 1 -N 1 -m 5 10.1.4.10
traceroute to 10.1.4.10 (10.1.4.10), 5 hops max, 60 byte packets
1 10.1.1.1 (10.1.2.2) 0.279 ms
2 10.1.2.1 (10.1.4.1) 0.521 ms
3 *
4 *
5 *

And here is the same networks mapped with tcptraceroute:

[root@fubar ~]# tcptraceroute -n -f 1 -m 5 -q 1 -S 10.1.4.10 80
Selected device eth0, address 10.1.1.10, port 39142 for outgoing packets
Tracing the path to 10.1.4.10 on TCP port 80 (http), 5 hops max
1 10.1.1.1 0.353 ms
2 10.1.2.1 0.450 ms
3 10.1.3.1 0.586 ms
4 10.1.4.10 [open] 0.701 ms

Because traceroute is sending UDP packets, our firewall policy drops them at the border. tcptraceroute however is sending TCP/80 packets to the Web server’s IP address. Since this is permitted by the policy, the packets make it through. We now know 10.1.3.1 is acting as a firewall. We also know that it is sitting directly in front of the Web server.

Here’s a copy of one of the packets generated by tcptraceroute. To the untrained eye, it looks like a perfectly normal TCP/80 SYN packet, except the TTL value is very low (there are other clues that this packet is not normal, but I’ll save that for another post):

18:33:21.531117 IP (tos 0×0, ttl 3, id 41587, offset 0, flags [none], proto TCP (6), length 40) 10.1.1.10.37496 > 10.1.4.10.80: S, cksum 0x7eaa (correct), 1793661553:1793661553(0) win 0

Protection against TimeX mapping

Most stateful inspection based firewalls are horrible at stopping TimeX mapping. In part 3 of this post, I’ll get into the proper way to control TimeX if you are running an open source firewall. For now however, I want to limit the advice I give to solutions that will work for every product.

There are two parts to every conversation, the stimulus and the response. When it comes to network mapping we can effectively nullify a scan if we can control either portion of the conversation. In this case here we have:

  • Stimulus = IP packet with an abnormally low TTL value
  • Response = TimeX from routers, port response from target

Since most commercial firewalls do not permit you to filter traffic based on TTL, we can’t control the stimulus in this situation. Nor can we control the port response, because it will be identical to a normal conversation. This leaves us with the outbound TimeX packets.

As close as possible to the edge of your perimeter, install a filter preventing ICMP type 11, code 0 (Time Exceeded in transit) packets from being sent to the Internet. For example if you have a border router outside of your firewall, install the filter on the router. Note that if you are running Cisco IOS, the router will partially ignore the filter and still transmit TimeX packets generated by it’s own interface. Running the “no ip unreachables” command can prevent this, but this command disables all ICMP error reporting and can cause communication problems. Make sure you understand the full impact of this command before using it.

By filtering outbound TimeX packets, we will prevent the attacker from seeing the IP address of all routers and firewalls between the filter installation point and the target host. The attacker will still be able to enumerate how many hops are on the link; they just will not be able to determine the IP address of each.

Exec Summary

Tools that perform traceroute type activity through open ports on a firewall are effective at mapping the links along a target network. Further, these tools are usually effective as enumeration of network address translation (NAT) settings. Since most firewalls cannot filter traffic based on TTL, we are usually left with trying to control the transmission of TimeX packets headed out towards the Internet.

Setting Up A Security Information Management System-Part 6

August 20th, 2009

So far in this series we have covered:

  • Defining a scope and focus for your SIM
  • Importance of building instead of buying your first system
  • Architecture and capacity planning
  • Recommended phases of deployment
  • Selecting a centralized logging server platform
  • How to accept remote log entries
  • Facility, severity and priority
  • How to sort log messages
  • Configuring appliances and operating systems to submit log entries

Cool. So we have log entries for a number of systems being collected on a centralized server. Now comes the most important task, leveraging that information. Log entries will be grouped into two categories; critical messages we want to know about right away, and log entries that will get caught as part of a regular review process.

Blacklisting Vs. Whitelisting

When reviewing log messages, we have two possible postures we can use. The first is referred to as blacklisting. With the blacklisting method we define what makes an event interesting enough to warrant reporting. This is similar to how anti-virus software detects Malware or the process we use to filter out spam.

Like most things in life, blacklisting has some good and bad aspects. On the plus side, it is usually pretty easy to write a signature if we know what we want to look for. Signatures can be tightly defined to help minimize the number of false positives we encounter. The problem with blacklisting is that we have to know what we are looking for. If a new attack generates a unique signature we have never encountered in the past, a blacklisting system will probably miss the event because no signature has been defined.

With whitelisting we define the events we understand, and then focus our attention on the new and unique log messages that are encountered. On the plus side we are far more likely to catch cutting edge attacks. Whitelisting tends to be relatively noisy however since we are bound to encounter unique log messages that are not indicative of a security event.

So which should we use? Good defense in-depth practices tell us to use both. ;)

Real time alerting

We can leverage blacklisting to perform real time alerting of event we want to be made aware of as soon as they occur. Blacklisting should only be used for low noise types of events. In other words, we want to stick with writing signatures for events that have a high probability of being a true security issue. Good examples are:

  • Different logon name failures all from the same IP address in a short amount of time
  • Multiple HTTP 403 errors being generated by a single IP in a short amount of time
  • Internal systems receiving many ICMP errors or TCP resets in a short amount of time

In order to perform real time alerting, we need software that will monitor the logs in real time. The log entries should be checked against defined signatures, which also indicate what to do when the event occurs.

Swatch

One of the easiest tools you can use for monitoring log entries is Swatch. Swatch is based on Perl. This means that while it is designed for UNIX and Linux systems, you can get it running on Windows if you have Perl installed. Simplicity is both Swatch’s biggest strength and weakness. While Swatch is relatively easy to deploy, it is also somewhat limited in its functionality. Still, if you are new to logging, Swatch makes an excellent first tool for real time alerting.

To deploy Swatch, you will need to create a unique configuration file for every log file you wish to monitor. In the configuration file we will tell Swatch what to look for in that particular log file, and what to do when the event is detected.

For example, let’s say we are going to have Swatch monitor the Web server’s error log. We may wish to create an entry similar to the following in Swatch’s configuration file for the error log:

# Look out for buffer overflows

watchfor  /client denied by server configuration|File name too long/

mail=noc@fubar.org:webmaster@fubar.org,subject=Web server overflow attempt

The line beginning with a “#” is simply commentary on the signature. The watchfor line identifies which character string(s) we want to define as being interesting. In this particular rule we have defined two different strings, “client denied by server configuration” and “File name too long” as interesting. The pipe character between the strings acts as a logical “or”. If either string is encountered, the mail parameter defines two different e-mail addresses we should contact. The subject line of the e-mail will be “Web server overflow attempt”, while the body of the e-mail will be the actual log entry.

If there are other patterns we wish to detect, we could add additional watchfor and mail statements. If we want to do more than send an e-mail, the exec parameter can be used to execute any application located on the local system. The threshold parameter can also be used to rate limit the reporting of events.

Simple Event Coordinator (SEC)

SEC is an amazing alerting tool you can download from the main Web site. It supports BSD and Linux, and ships with a number of popular Linux flavors. SEC fully supports regular expressions and allows you to create extremely granular signatures.

The rule format is as followed:

type= Method of detection

ptype= Pattern type (regular expression, string match)

pattern= What to search for

desc= Description (can be a variable)

action= What to do when detected

There is an excellent archive of pre-written rules you can use that is well worth looking at. You can match on multiple patterns, define multiple thresholds, all while processing hundreds of log messages per second. About the only drawback of SEC is that you need a good understanding of regular expressions to use the tool effectively. Still, the tool can be far more powerful and flexible than Swatch.

Where can I get more alerting ideas?

I was involved with the creation of the original SANS Top 5 Log Reports. For the April, 2009 Log Summit I updated my presentation to break up report examples into low noise and high noise categories. Anything on the low noise list would make a good candidate for alerting. Anything in the high noise section is better monitored through daily reports.

Daily Reports

So we leveraged blacklisting to generate our real time alerts. We’ll now leverage whitelisting to help highlight unknown but interesting traffic patterns within our daily reports.

When it comes to daily reports, we tend to gravitate towards the big numbers. What are the top 5 IPs transferring data? Which e-mail address sent the most messages? While the big numbers are certainly important, it has been my experience that the security events you need to worry about the most generate the fewest log entries. The smart attackers try very hard to remain hidden within the noise. So the only way to find them is to lower the signal to noise ratio.

I author the Perimeter Security track for SANS. One of the labs I have my students perform is to parse a 200,000 line log file. The goal is to spot the interesting patterns as well as formulate the review into an automated process. Most folks find the port scanner as it is pretty noisy. Some even spot the IP address performing application layer attacks against the Web server. What most people miss however, are the six lines that are a pretty clear indication that an internal system is already compromised and calling home for marching orders. How do you find those 6 lines? By whitelisting everything you understand and focusing on what ever is left.

So it is OK for our daily reports to give us pretty charts with big numbers. One of the reports however has to be able to move all the crud to the side so we can better spot the interesting patters.

Logwatch

One of the best tools for doing a daily log review is Logwatch. Logwatch will summarize all of the log patterns it understands, while highlighting anything without a predefined signature. Best way to understand this feature is to look at an example.

SSHD Killed: 2 Time(s)

SSHD Started: 1 Time(s)

Connections:

Failed logins from these:

msmith/password from 1.3.247.11: 6 time(s)

jsmith/password from 1.3.247.11: 5 time(s)

psmith/password from 1.3.247.11: 4 time(s)

Users logging in through sshd:

jjones logged in from sundown (1.3.247.9) using publickey: 146 Times(s)

jsmith logged in from dialup5533.wnskvtao.sover.net (216.114.181.200) using password: 1 Times(s)

jsmith logged in from dialup984.wnskvtao.sover.net (216.114.163.223) using password: 1 Times(s)

bjones logged in from charlie (1.3.247.11) using publickey: 444 Times(s)

jsmith logged in from 192.168.1.173 using password: 2 Times(s)

djones logged in from charlie (1.3.247.11) using password: 47 Times(s)

**Unmatched Entries**

Received disconnect from 148.64.147.168: 3: Key exchange failed.

Received disconnect from 216.114.160.132: 11: All open channels closed

scanned from 146.87.114.150 with SSH-1.0-SSH_Version_Mapper.  Don’t panic.

scanned from 211.184.226.99 with SSH-1.0-SSH_Version_Mapper.  Don’t panic.

In the above example Logwatch is being used to summarize SSH activity. It understands the service being stopped and started, failed logon attempts as well as successful logons. All this information is displayed in summary format so it is easier to digest. For example we do not know exactly when msmith incorrectly entered their password, but we see it happened six times, all from IP address 1.3.247.11. So instead of having six lines to digest, we only need to look at one. If we want to see each specific log entry, we can always refer back to the original logs.

Now look at the “Unmatched Entries” section. Each of these is an event that Logwatch does not have a signature for. Rather than ignore them, which would happen with a blacklist based system, they are summarized here for us to review. We then have the option to generate a signature for a specific entry so it will get categorized in a similar fashion to the process and logon sections.

Clearly this gives us the best of both worlds. The above report represents a bit over 650 lines worth of log entries, summarized down into an easy to read report. Most importantly, none of the log entries had to be ignored in order to produce this summary report.

Beyond daily reporting

You may also find it useful to perform long-term trend analysis and data mining on your log data. This may help to reveal patterns that normally go undetected when logs for a small snapshot in time (like 24 hours) is reviewed. Arguably one of the best tools available for dealing with lots of data is Splunk.

Splunk

Splunk is available as a free version that is limited to processing 500 MB per day, or you can invest in the commercial version that supports unlimited data processing. Splunk is extremely flexible at accepting data. It can act as a centralized logging server, or you can transfer files via a number of methods including FTP and HTTP. Once the data is received, Splunk indexes every field in each log file. This gives you unparalleled sorting and searching capability.

The full features are Splunk are too numerous to get into in this post. Check their site for a full list of supported features. What Splunk is extremely good at is manipulating and reporting on a huge number of log entries. It can index, search and report on billions of log entries per second. This makes it extremely useful for generating long-term trend reports or running saved searches for data mining purposes.

Exec Summary

We’ll we’ve reached the end of the trail. Hopefully you feel like you have a better handle on how to deploy a centralized logging solution, as well as how to leverage it to better secure your environment. If you have any questions, please feel free to drop a comment. :)

Setting Up A Security Information Management System-Part5

August 14th, 2009

In my last post I discussed how a logging server uses a message’s priority value to sort incoming log messages. In this installment I’ll talk about testing connectivity, as well as how to get various gear on the wire to submit their log entries to a centralized server.

Requirements

In order for a system to submit log entries, it has to have support for Syslog. Log entries need to be transmitted in clear text to port UDP/514 on the logging server. If you are using rsyslog on the server, TCP/514 is acceptable as well.

Submitted long entries should include the following info:

  • A priority value (in <PRI> format) at the beginning of the payload
  • The name of the application submitting the log entry
  • The process ID used by the application
  • The body of the log messages

But to be honest, Syslog is not very fussy. It will attempt to record anything sent to its listening port as a long entry. If a priority value is not present, it will record the entry to whatever file is used for facility 1. Typically this is /var/log/messages.

Testing Connectivity

When deploying a new setup, I like to verify connectivity between the first few clients and the logging server. If logging is not working, you will want to be able to isolate the problem area. Typical problems include:

  • Client incorrectly configured
  • Firewall in the way (on the client, the wire, or the logging server)
  • Server incorrectly configured

You will want to monitor the messages file on the logging server to ensure the test log entry is received. On the logging server, run the following command:

tail -f /var/log/messages

Tail will open the log file read only and print the last five lines. As new log entries are received, they will get printed to the screen as well.

To generate a test log entry, I like to use Netcat. Netcat can be used from any Windows, Linux or UNIX system. From the test system, run the command:

echo ‘this is a UDP test log entry’ | nc -u -w 1 <IP address of logging server> 514

You should see the echoed portion of the command show up in the /var/log/messages file on the logging server. If not, launch a packet sniffer and see if you can determine where the failure is occurring. If you wish to test TCP connectivity as well, simply run the command again leaving out the Netcat “-u” switch:

echo ‘this is a TCP test log entry’ | nc -w 1 <IP address of logging server> 514

If both entries are received, we are ready to start pointing devices at our logging server.

Network hardware

Most network hardware supports Syslog via UDP/514. It is just a matter of going through the documentation and determining the proper command set for sending log entries to a remote server.

If you are using Cisco IOS, run the following from global configuration mode:

logging <IP address of logging server>

If you wish to change the logging facility from “local use 7” to something else, the command is:

logging facility <facility short name>

So to change logging facility to “local use 3”, the command would be:

logging facility local3

Linux and UNIX systems

There are a number of Syslog alternatives available for Linux and UNIX. In this section I’ll cover how to get Syslog and rsyslog to forward their log messages to a remote server.

Syslog

If the client is running Syslog, you will need to edit the /etc/syslog.conf file. Add the following line to the bottom of the file:

*.*                   @<IP address of logging server>

So an example would be:

*.*                   @192.168.1.150

Note the white space between the wild card match and the remote IP address MUST BE TAB CHARACTERS. If you use spaces, Sysylog will not be able to parse the file. Save and exit the file, then restart Syslog to activate the changes.

rsyslog

With rsyslog we have the option of forwarding our log messages via UDP or TCP. In either case we will need to edit the /etc/rsyslog.conf file. To forward long entries using UDP, add the following line to the end of the file:

*.* @<IP address of logging server>:<port>

So an example would be:

*.* @192.168.1.150:514

If we wish to use TCP instead, we simply use two “@” symbols:

*.* @@192.168.1.150:514

Once complete save and exit the file. You will need to restart rsyslog to activate your changes.

Windows systems

As mentioned in a previous post, Windows does not include support for Syslog. This means you will need third party software to convert your logs in real time and submit them to a logging server. The Loganalysis Web site has a list of possible solutions.

For the purposes of this post, I’ll cover Snare. It is free for use, can be commercially licensed, and have a very simple deployment process.

Once you download the software, you need to configure it for the system. This is shown in Figure #1. Snare needs to know the location of the logging server as well as what facility and severity level to use. Once complete, click the “Latest Events” menu potion to see which specific Event Viewer log entries Snare is forwarding to the logging server.

snare-config

Exec Summary

In this post I discussed testing connectivity to a logging server, as well as how to configure clients to centralize their logs. In the next post I’ll talk about what to look for in a daily reporting tool, as well as real time alerting.

Setting Up A Security Information Management System-Part4

August 12th, 2009

In the last post I talked about how to setup a logging server that will accept remote log entries. In this installment I’ll talk about how to sort log entries into specific files.

Facility, severity and priority

Let’s talk about how logging servers figure out which file to store a log entry in when it gets received. Log messages contain two descriptive parameters, facility and severity. When these two parameters are combined, the value is referred to as the priority of the log message.

Facility

Facility defines the type of process that generated the log entry. For example all mail servers are expected to identify that their log entries are part of the “mail” facility. FTP processes should use the FTP facility, NTP processes should use the NTP facility, and so on. RFC 3164 defines the valid facilities, but here’s the list:

Numerical          Facility

Code

0              kernel messages (kern)

1              user-level messages (user) – default if not specified

2              mail system (mail)

3              system daemons (daemon)

4              security/authorization messages (auth)

5              internal syslogd (syslog)

6              line printer subsystem (lpr)

7              network news subsystem (news)

8              UUCP subsystem (uucp)

9              clock daemon

10            security/authorization messages (authpriv)

11            FTP daemon (ftp)

12            NTP subsystem (ntp)

13            log audit

14            log alert

15            clock daemon (cron)

16            local use 0  (local0)

17            local use 1  (local1)

18            local use 2  (local2)

19            local use 3  (local3)

20            local use 4  (local4)

21            local use 5  (local5)

22            local use 6  (local6)

23            local use 7  (local7)

The “local use” facilities are similar to private addresses in the IP world. These facilities are not reserved, and are available for anyone to use as they see fit.

Facility problems

There are a couple of problems here. To start, where is the Web server facility? This list was generated back in 1987 before Web servers (or Gopher for that matter) existed. So some of the services we use today (VoIP, SQL, etc.) are missing. Also, some of the listed services typically go unused in a corporate environment. UUCP and Network News (NNTP) are excellent examples.

The lack of current services has caused many vendors to rely heavily of the local use facilities. This can cause potential conflicts when we get into sorting our log entries. For example Linux uses local use 7 to identify its boot time log entries. Apache also uses local use 7 for Web server errors. So down the road it may be difficult for us to sort Web errors and boot messages into different log files.

Another problem is that there is no verbose description about each of these facilities. This can make it a bit difficult for a programmer to identify which one to use. For example, let’s say we’ve written a program that authenticates a user for network access. Which facility should we use? Facility 4 and 10 seem the most likely, but their descriptions are identical. How do we choose? If our program runs as a background process should we actually choose facility 3 instead?

You get the idea. The list is not as clear-cut as it could be. It is not uncommon to see vendors use a different facility than you would expect. For example I’ve seen VPN vendors undecided as to the differences between facilities 4 and 10, so they simply send some percentage of log entries to each.

Severity

Severity defines the importance of the log entry. The same RFC 3164 defines the severity levels as:

Numerical         Severity

Code

0             Emergency: system is unusable (emerg)

1             Alert: action must be taken immediately (alert)

2             Critical: critical conditions (crit)

3             Error: error conditions (error)

4             Warning: warning conditions (warn)

5             Notice: normal but significant condition (notice)

6             Informational: informational messages (info)

7             Debug: debug-level messages (debug)

Luckily the severity levels are far less vague than the facility descriptions. This means they are much less confusing to work with. The higher numbered severity levels tend to be very verbose. This means saying you want to send debug level messages to your logging server could easily flood the network. Use the higher numbered severity levels with caution.

Priority

When a log entry gets transmitted to a log server, the first value contained within it is the priority of the message. The priority is the facility and severity values combined per the following math formula:

( Facility x 8 ) + Severity = Priority

So lets say our mail server needs to send a warning message. What would the priority be? The mail facility has a value of 2, while warnings have a severity of 4. So the math would be:

( 2 x 8 ) + 4 = 20

If a print server (facility 6) needed to send a log entry saying it is currently on fire (severity 0), the priority value in the message would be:

( 6 x 8 ) + 0 = 48

When a log entry gets transmitted, the priority value needs to be encapsulated in less than and greater than signs. So the priority value in the above mail server message would be “<20>” while the print server would use “<48>”. Again, this needs to be the first piece of information transmitted in the log message.

Sorting log entries

The priority value is used by logging servers to sort the incoming messages. For example if we wanted all mail messages to go to the same file, we would tell our logging server that all messages with a priority of 16 (2×8+0) through 23 (2×8+7) should go to the “maillog” file. Most logging servers (like rsyslog) will let you do this numerically or by using the short description names.

rsyslog.conf example

Here are two lines out of the rsyslog.conf file that ships with Fedora. Let’s talk about what they are actually doing:

authpriv.*                                                              /var/log/secure

*.info;mail.none;authpriv.none;cron.none                /var/log/messages

These lines define two of the rules for determining which log entries should go to which log files. The syntax for sorting is:

facility.severity

So the first line says all facility 10 (authpriv) log entries, regardless of severity (“*” is a wild card match) should be sent to the file /var/log/secure.

The second line is a bit more complex as it has multiple conditions separated by semi-colons. These conditions state:

  • *.info = All facilities, so long as the severity level is info
  • mail.none = No mail facility log entries, regardless of severity
  • authpriv.none = No authprive facility log entries, regardless of severity
  • cron.none = No cron facility log entries, , regardless of severity

Or, to translate this to English, the line says “Send all severity “info” messages to /var/log/messages, except those that contain a facility of “mail”, “authpriv” or “cron”.

So with these rules we can define any combination of facility and severity values and which log file we would like to direct it. When you first set this up, stick with the defaults. As you start collecting log entries you can tweak the rules as you see fit.

Bending the RFCs

In an ideal world, the RFCs would be a perfect fit for everyone’s needs. Unfortunately this is not always the case. A good example is the logging facilities. As mentioned we are missing facilitates for modern day services, while at the same time have facilitates that we will never use.  An obvious answer is to recycle the outdated facilities in order to support modern services.

For example, UUCP ( facility 8 ) is not even supported by modern operating systems. With this in mind, I like to use it as my Windows facility. That way I can sort all Windows log entries into their own file. For network hardware, I use the network news facility (facility 7). If you are unsure if a facility is currently in use, modify your logging server’s configuration file to send all log entries for that facility to a unique file:

ftp.*                                                                 /var/log/facility-test

If no entries arrive, you are in good shape. Just keep in mind that a legitimate service may use it at a later date. For example if three months from now someone sets up an FTP server, we may have problems if we are already using the FTP facility (facility 11). If you are unsure you can always stick with the local use facilities, as that is what they are intended for. Local use 0 and 7 seem to be the most heavily used, so avoid them when possible.

Other sorting options

While its not part of the RFC, some logging servers give you the ability to sort log entries based on patterns within the message. A good example is Syslog-NG. Syslog-NG will sort based on facility and severity, but you can also sort based on source IP, the application that generated the log entry, etc. This gives you far more flexible sorting options and it may be something to consider if facility/severity is not granular enough for your needs.

Exec Summary

In this installment we talked about how facility and severity is used to sort log entries. In my next post I’ll talk about how to get each of our systems to submit log messages to our centralized server.

Setting Up A Security Information Management System-Part3

August 11th, 2009

In the last post I covered some of the architecture concerns with rolling out a centralized security information system. In this post I’ll cover deploying a basic log server, and verifying that it is ready to accept log entries.

Selecting a logging server

The first thing we need to do is select a platform for our logging server. If we are simply setting up a test lab, Windows, UNIX or Linux will all make great choices. Choosing Windows might be helpful for a Windows administrator, as they will not have to cut the curve on a new operating system while attempting to test out logging. While Windows does not support Syslog out of the box, there are some excellent packages like Kiwi Syslog Server and WinSyslog that will add Syslog support. Both have evaluation versions and are relatively inexpensive to license.

If we are talking about setting up a production server however, we will want to stay away from Windows. Windows is notorious for having a horrible IP stack. If fact previous “patches” have crippled it even further in the interest of slowing worm propagation and increasing the speed of the GUI. While many of these limitations have been removed in 2008 server and Windows 7, IP performance is still sub-par when compared to a Linux or UNIX system deployed on identical hardware.

So that leaves Linux and UNIX as choices for a production system. Which to choose will depend on personal choice. Some like the stability of BSD while others like the flexibility of Linux. For the purpose of this document I’ll be working with a Fedora based Linux system. Installation and setup of the OS is relatively intuitive and straightforward.

Accepting remote logs

In order to accept log entries from remote systems, older versions of Fedora required you to initialize the Syslog daemon (syslogd) with the “-r” option. This was done by adding “-r” to the syslogd_options line of /etc/sysconfig/syslog file. Some versions of Linux still support legacy Syslog, and require you to add “-r” to the Syslog RC initialization file. Check the docs for your specific distribution.

New Fedora systems however support “Reliable Syslog” or rsyslog. Implementation is pretty similar to plain old Syslog, except rsyslog supports communications over TCP/514 as well as UDP/514. In the last post I described that running log entries over TCP can fix some of the reasons we loose log entries, but not all of them. If you want to play around with TCP support, go ahead and open both ports on the logging server.

To get rsyslog to accept remote log entries, we must edit the /etc/rsyslog.conf file. Towards the beginning of the file you should see the following:

# Provides UDP syslog reception

#$ModLoad imudp.so

#$UDPServerRun 514

# Provides TCP syslog reception

#$ModLoad imtcp.so

#$InputTCPServerRun 514

The “#” (pound) symbol at the beginning of the line tells the system not to process the rest of the line. We use this technique for commentary as well as “commenting out” commands we do not wish to have processed. By commenting out the ModLoad and port specification lines, we prevent rsyslog from opening a listening socket. The helps to keep the system in a more secure state.

Since we are setting up a centralized logging server, we will need to open those sockets to accept remote log entries. Modify the /etc/rsyslog.conf file to remove the appropriate pound symbols. The file should now look like this:

# Provides UDP syslog reception

$ModLoad imudp.so

$UDPServerRun 514

# Provides TCP syslog reception

$ModLoad imtcp.so

$InputTCPServerRun 514

If you know you will never use TCP, you can leave the last two lines commented out. Once complete save your changes and exit the file.

We now need to restart logging so our changes are implemented. This is done on Fedora by executing the following command:

service rsyslog restart

When you execute the command, you should see rsyslog stop and start with a status of “OK”. If the shutdown failed, it is because rsyslog is not being initialized at boot time. From the command line, execute the command “setup” and select “System services” from the main menu. When the services menu appears, scroll down the list till you find rsyslog. Check off the box to the left and then select “OK”. Quit the setup utility and rsyslog will now initialize whenever the system is booted.

Verifying the listening port

Next we need to ensure that our logging process is accepting remote log entries. From the command line, type “netstat -an | grep :514”. The output should look similar to the following:

[root@fubar ~]# netstat -an | grep :514

tcp     0      0 0.0.0.0:514                 0.0.0.0:*              LISTEN

tcp     0      0 :::514                                 :::*              LISTEN

udp    0      0 0.0.0.0:514                 0.0.0.0:*

udp    0      0 :::514                                 :::*

The first line tells us that TCP/514 is listening via IPv4 on all network interfaces. Line two tells us the TCP port is also listening on any interface with an Ipv6 address. Lines three and four are the same information, except for UDP. If any of the entries state “127.0.0.1:514” instead of “0.0.0.0:514”, then the port is only bound to the loopback interface. Only the local system will be able to reach it. This can happen with legacy Syslog systems if you forgot to run them with the “-r” switch.

You should now have a logging server that is capable of receiving inbound log entries. In the next post I’ll talk about how these log entries get sorted into specific files.

Setting Up A Security Information Management System-Part2

August 10th, 2009

In my last post we discussed defining your goals for a Security Information Management (SIM) system. In this post we’ll talk about architecture concerns as well as capacity planning.

Network communications

The goal will be to have one or more SIM servers that will collect log entries from other systems. This will obviously have an impact on network utilization. How much of an impact will depend on the quantity and type of systems we collect log entries from.

UDP/514

Just about all systems support the original Syslog communication implementation which goes all the way back to the year 1988. The last description of this spec appeared in RFC 3164. While this RFC has been obsoleted by RFC 5424, RFC 3164 still represents the implementation supported by most vendors. Windows is a notable exception (proprietary, no Syslog support), but there is 3rd party software to rectify this.

Both RFCs specify the use of the UDP protocol when transmitting log entries. The well-known port to use is UDP/514. Where RFC 3164 and 5424 differ is in the format of the log message. I’ll dig into these differences in a later post.

The love/hate of using UDP

On the positive side, UDP is connectionless. This means that it generates less traffic than if we used TCP. Also, log transmissions are a one-way process. The host generating a log entry sends a packet to the logging server, but the logging server never replies. This means we can control traffic flow with static filtering rather than stateful filtering which will place less overhead on the traffic control device. Also, the UDP header is typically 1/3 – 1/4 the size of a TCP header, which means smaller transmission packets, thus less network overhead.

On the negative side, UDP is connectionless. ;) This means that it has minimal error reporting capability. For example if we transmit a log entry and the frame goes missing (say a collision or a firewall dropping the packet), UDP does not have the ability to detect that a retransmission is required. This means its possible for log entries to go missing if we overflow the network. Further, UDP has no flow control ability. If the SIM server recognizes it is reaching capacity it has no way to slow down the incoming transmission of log entries. The SIM server’s only option is to throw the packets away without processing them.

Needless to say, we need to ensure that we properly specify capacity. If the network or the SIM server becomes overloaded, we are going to lose log entries. Proper capacity planning starts with understanding the impact of logging on the network.

Network impact of logging

The maximum size of a UDP Syslog packet has different specifications in different RFC’s. The outdated RFC 3164 defines the maximum message size as being 1,024 bytes. RFC 5426 drops this maximum size to 480 bytes. If a vendor is still following the old spec, its possible they may still think the 1,024 byte size is legitimate. It has been my experience however that most log entry packets range in size from 75 to 225 bytes, so the maximums are a non-issue.

Windows systems, firewalls and intrusion detection systems tend to generate the largest messages. Network hardware tends to generate the smallest messages. If we have a 100 Mb Ethernet network, the theoretical maximum would be somewhere around 50,000 to 130,000 frames per second. This assumes zero other traffic, which is rarely the case. For the purposes of capacity planning, assume you will be limited to 5,000 log entries per second. This number might even be less if you have a busy network. Taking some utilization measurements during the planning process is key.

Syslog over TCP

As mentioned above, UDP introduces the problem that log entries can become lost without us even knowing it. There are ways to validate capacity, which I will cover in a later post. Some feel running Syslog over TCP can rectify this problem. TCP can be leveraged for its reliability to insurie our log entries are properly received.

Unfortunately TCP support for Syslog is no were near standardized. Some vendors support TCP by simply listening on TCP/514. RFC 3195 defines Reliable Syslog as using port TCP/601, but its adoption has been extremely limited. RFC 5425 defines the use of TLS to secure Syslog transmission. This RFC specifies the use of port TCP/6514. This is a brand new specification and I’m unaware of anyone supporting it just yet.

So support for TCP is all over the board. Further, TCP does not completely fix the problem. While TCP will give us flow control and reliability on the wire, it cannot make up for the fact that Syslog at the application layer does not acknowledge the receipt of log entries. This was by design as it reduces overhead. The problem is that even by using TCP we can still lose messages within the IP stack and never know it is occurring.

So if you want to try and transmit logs via TCP, its only going to work between a specific vendor’s client and server software. For example you may need to run Syslog-NG on both ends of the connection to leverage it’s support for TCP. This is not always practical, as you cannot run the software client on appliances like access points, switches, routers, etc.

Where to place the logging server

When deciding where to place the logging server, we have to keep both network capacity and security in mind. Take a look at Figure #1. This is an ideal situation where the logging server has been isolated to a dedicated network operations network. This isolates it from the other security zones and makes it much easier to leverage the firewall to restrict access to the logging server.

sim-placement

The drawing assumes we only need one logging server for our entire environment. What if we have 100,000 nodes to keep track of? Large networks may need to look at aggregating the data. For example if I have 10 field offices, I may need to have a logging server located at each of them collecting local log info. Each of these logging servers would then relay summary information back to the corporate office for network wide trend reports. This way we maintain a high level of visibility while reducing network load. I’ll cover some possible aggregation options in a later post.

How many systems can log to a single logging server?

There is no single answer to this question as each network is different. It is going to depend on how much capacity is available on your network and how many log entries each of these systems generate. For example I could probably point 50,000 switches at a single logging server, as switches tend to generate very few messages. Firewalls on the other hand are extremely chatty, so I might max out the network or SIM server with only 20-50 firewalls. So to answer the question we need to look at two metrics:

  • How much free capacity is there on the wire?
  • How many log entries will each host generate?

The second question is not as straightforward as it may seem. For example the average desktop may only generate 40-100 log entries per day. If we can push 1,000 log entries per second, the math says we should be able to point 86 million desktops at a single logging server. The problem is about 80% of those messages are generated at initial boot time. If everyone typically powers around 9:00 AM, the math changes to a more realistic 750 desktops (again, assuming we can push 1,000 log entries per second over the wire).

So we can’t just look at quantity of long entries. We need to take time of day into account as well. This will identify the actual number of log entries per second we can expect under worse case conditions. Worse case is the capacity level we need to plan for.

Deploy centralized logging in phases

If you have tens of thousands of systems to deal with, it is easy to get overwhelmed with the work involved with deploying centralized logging. Rolling out the solution in phases makes it easier to wrap your brain around the whole process.

First, start with a single logging server. You may not be able to cover your whole network, but we have to start somewhere. Large networks should consider a deployment at the corporate office first, moving out to field offices once the corporate system is fully vetted and functional.

You will also want to phase in which devices you are collecting information from. I usually go with the following order:

  1. Network intrusion detection systems
  2. Firewalls
  3. Network hardware (routers, switches, access points, print servers, etc.)
  4. Internet facing servers
  5. Internal servers
  6. Internal desktops

Obviously you can tweak this list to fit your needs. For example if you do not plan on collecting info from desktops, simply leave that step out. I like to start with network intrusion systems first as their log entries are well suited for vetting both daily reports and real time alerting. Once we have a handle on alerting and reporting, adding additional devices becomes far easier.

Exec Summary

In this post I covered all the things you need to consider when initially deploying a centralized logging solution. We covered how to predict the impact it will have on network utilization, how to calculate the number of hosts per logging server, and why it is important to deploy the solution in phases. In the next post we’ll start talking about configuring the centralized logging server. Specifically, we’ll look at how we are going to sort log entries.