Posts Tagged ‘Logging’

Setting Up A Security Information Management System-Part 6

August 20th, 2009

So far in this series we have covered:

  • Defining a scope and focus for your SIM
  • Importance of building instead of buying your first system
  • Architecture and capacity planning
  • Recommended phases of deployment
  • Selecting a centralized logging server platform
  • How to accept remote log entries
  • Facility, severity and priority
  • How to sort log messages
  • Configuring appliances and operating systems to submit log entries

Cool. So we have log entries for a number of systems being collected on a centralized server. Now comes the most important task, leveraging that information. Log entries will be grouped into two categories; critical messages we want to know about right away, and log entries that will get caught as part of a regular review process.

Blacklisting Vs. Whitelisting

When reviewing log messages, we have two possible postures we can use. The first is referred to as blacklisting. With the blacklisting method we define what makes an event interesting enough to warrant reporting. This is similar to how anti-virus software detects Malware or the process we use to filter out spam.

Like most things in life, blacklisting has some good and bad aspects. On the plus side, it is usually pretty easy to write a signature if we know what we want to look for. Signatures can be tightly defined to help minimize the number of false positives we encounter. The problem with blacklisting is that we have to know what we are looking for. If a new attack generates a unique signature we have never encountered in the past, a blacklisting system will probably miss the event because no signature has been defined.

With whitelisting we define the events we understand, and then focus our attention on the new and unique log messages that are encountered. On the plus side we are far more likely to catch cutting edge attacks. Whitelisting tends to be relatively noisy however since we are bound to encounter unique log messages that are not indicative of a security event.

So which should we use? Good defense in-depth practices tell us to use both. ;)

Real time alerting

We can leverage blacklisting to perform real time alerting of event we want to be made aware of as soon as they occur. Blacklisting should only be used for low noise types of events. In other words, we want to stick with writing signatures for events that have a high probability of being a true security issue. Good examples are:

  • Different logon name failures all from the same IP address in a short amount of time
  • Multiple HTTP 403 errors being generated by a single IP in a short amount of time
  • Internal systems receiving many ICMP errors or TCP resets in a short amount of time

In order to perform real time alerting, we need software that will monitor the logs in real time. The log entries should be checked against defined signatures, which also indicate what to do when the event occurs.

Swatch

One of the easiest tools you can use for monitoring log entries is Swatch. Swatch is based on Perl. This means that while it is designed for UNIX and Linux systems, you can get it running on Windows if you have Perl installed. Simplicity is both Swatch’s biggest strength and weakness. While Swatch is relatively easy to deploy, it is also somewhat limited in its functionality. Still, if you are new to logging, Swatch makes an excellent first tool for real time alerting.

To deploy Swatch, you will need to create a unique configuration file for every log file you wish to monitor. In the configuration file we will tell Swatch what to look for in that particular log file, and what to do when the event is detected.

For example, let’s say we are going to have Swatch monitor the Web server’s error log. We may wish to create an entry similar to the following in Swatch’s configuration file for the error log:

# Look out for buffer overflows

watchfor  /client denied by server configuration|File name too long/

mail=noc@fubar.org:webmaster@fubar.org,subject=Web server overflow attempt

The line beginning with a “#” is simply commentary on the signature. The watchfor line identifies which character string(s) we want to define as being interesting. In this particular rule we have defined two different strings, “client denied by server configuration” and “File name too long” as interesting. The pipe character between the strings acts as a logical “or”. If either string is encountered, the mail parameter defines two different e-mail addresses we should contact. The subject line of the e-mail will be “Web server overflow attempt”, while the body of the e-mail will be the actual log entry.

If there are other patterns we wish to detect, we could add additional watchfor and mail statements. If we want to do more than send an e-mail, the exec parameter can be used to execute any application located on the local system. The threshold parameter can also be used to rate limit the reporting of events.

Simple Event Coordinator (SEC)

SEC is an amazing alerting tool you can download from the main Web site. It supports BSD and Linux, and ships with a number of popular Linux flavors. SEC fully supports regular expressions and allows you to create extremely granular signatures.

The rule format is as followed:

type= Method of detection

ptype= Pattern type (regular expression, string match)

pattern= What to search for

desc= Description (can be a variable)

action= What to do when detected

There is an excellent archive of pre-written rules you can use that is well worth looking at. You can match on multiple patterns, define multiple thresholds, all while processing hundreds of log messages per second. About the only drawback of SEC is that you need a good understanding of regular expressions to use the tool effectively. Still, the tool can be far more powerful and flexible than Swatch.

Where can I get more alerting ideas?

I was involved with the creation of the original SANS Top 5 Log Reports. For the April, 2009 Log Summit I updated my presentation to break up report examples into low noise and high noise categories. Anything on the low noise list would make a good candidate for alerting. Anything in the high noise section is better monitored through daily reports.

Daily Reports

So we leveraged blacklisting to generate our real time alerts. We’ll now leverage whitelisting to help highlight unknown but interesting traffic patterns within our daily reports.

When it comes to daily reports, we tend to gravitate towards the big numbers. What are the top 5 IPs transferring data? Which e-mail address sent the most messages? While the big numbers are certainly important, it has been my experience that the security events you need to worry about the most generate the fewest log entries. The smart attackers try very hard to remain hidden within the noise. So the only way to find them is to lower the signal to noise ratio.

I author the Perimeter Security track for SANS. One of the labs I have my students perform is to parse a 200,000 line log file. The goal is to spot the interesting patterns as well as formulate the review into an automated process. Most folks find the port scanner as it is pretty noisy. Some even spot the IP address performing application layer attacks against the Web server. What most people miss however, are the six lines that are a pretty clear indication that an internal system is already compromised and calling home for marching orders. How do you find those 6 lines? By whitelisting everything you understand and focusing on what ever is left.

So it is OK for our daily reports to give us pretty charts with big numbers. One of the reports however has to be able to move all the crud to the side so we can better spot the interesting patters.

Logwatch

One of the best tools for doing a daily log review is Logwatch. Logwatch will summarize all of the log patterns it understands, while highlighting anything without a predefined signature. Best way to understand this feature is to look at an example.

SSHD Killed: 2 Time(s)

SSHD Started: 1 Time(s)

Connections:

Failed logins from these:

msmith/password from 1.3.247.11: 6 time(s)

jsmith/password from 1.3.247.11: 5 time(s)

psmith/password from 1.3.247.11: 4 time(s)

Users logging in through sshd:

jjones logged in from sundown (1.3.247.9) using publickey: 146 Times(s)

jsmith logged in from dialup5533.wnskvtao.sover.net (216.114.181.200) using password: 1 Times(s)

jsmith logged in from dialup984.wnskvtao.sover.net (216.114.163.223) using password: 1 Times(s)

bjones logged in from charlie (1.3.247.11) using publickey: 444 Times(s)

jsmith logged in from 192.168.1.173 using password: 2 Times(s)

djones logged in from charlie (1.3.247.11) using password: 47 Times(s)

**Unmatched Entries**

Received disconnect from 148.64.147.168: 3: Key exchange failed.

Received disconnect from 216.114.160.132: 11: All open channels closed

scanned from 146.87.114.150 with SSH-1.0-SSH_Version_Mapper.  Don’t panic.

scanned from 211.184.226.99 with SSH-1.0-SSH_Version_Mapper.  Don’t panic.

In the above example Logwatch is being used to summarize SSH activity. It understands the service being stopped and started, failed logon attempts as well as successful logons. All this information is displayed in summary format so it is easier to digest. For example we do not know exactly when msmith incorrectly entered their password, but we see it happened six times, all from IP address 1.3.247.11. So instead of having six lines to digest, we only need to look at one. If we want to see each specific log entry, we can always refer back to the original logs.

Now look at the “Unmatched Entries” section. Each of these is an event that Logwatch does not have a signature for. Rather than ignore them, which would happen with a blacklist based system, they are summarized here for us to review. We then have the option to generate a signature for a specific entry so it will get categorized in a similar fashion to the process and logon sections.

Clearly this gives us the best of both worlds. The above report represents a bit over 650 lines worth of log entries, summarized down into an easy to read report. Most importantly, none of the log entries had to be ignored in order to produce this summary report.

Beyond daily reporting

You may also find it useful to perform long-term trend analysis and data mining on your log data. This may help to reveal patterns that normally go undetected when logs for a small snapshot in time (like 24 hours) is reviewed. Arguably one of the best tools available for dealing with lots of data is Splunk.

Splunk

Splunk is available as a free version that is limited to processing 500 MB per day, or you can invest in the commercial version that supports unlimited data processing. Splunk is extremely flexible at accepting data. It can act as a centralized logging server, or you can transfer files via a number of methods including FTP and HTTP. Once the data is received, Splunk indexes every field in each log file. This gives you unparalleled sorting and searching capability.

The full features are Splunk are too numerous to get into in this post. Check their site for a full list of supported features. What Splunk is extremely good at is manipulating and reporting on a huge number of log entries. It can index, search and report on billions of log entries per second. This makes it extremely useful for generating long-term trend reports or running saved searches for data mining purposes.

Exec Summary

We’ll we’ve reached the end of the trail. Hopefully you feel like you have a better handle on how to deploy a centralized logging solution, as well as how to leverage it to better secure your environment. If you have any questions, please feel free to drop a comment. :)

Setting Up A Security Information Management System-Part5

August 14th, 2009

In my last post I discussed how a logging server uses a message’s priority value to sort incoming log messages. In this installment I’ll talk about testing connectivity, as well as how to get various gear on the wire to submit their log entries to a centralized server.

Requirements

In order for a system to submit log entries, it has to have support for Syslog. Log entries need to be transmitted in clear text to port UDP/514 on the logging server. If you are using rsyslog on the server, TCP/514 is acceptable as well.

Submitted long entries should include the following info:

  • A priority value (in <PRI> format) at the beginning of the payload
  • The name of the application submitting the log entry
  • The process ID used by the application
  • The body of the log messages

But to be honest, Syslog is not very fussy. It will attempt to record anything sent to its listening port as a long entry. If a priority value is not present, it will record the entry to whatever file is used for facility 1. Typically this is /var/log/messages.

Testing Connectivity

When deploying a new setup, I like to verify connectivity between the first few clients and the logging server. If logging is not working, you will want to be able to isolate the problem area. Typical problems include:

  • Client incorrectly configured
  • Firewall in the way (on the client, the wire, or the logging server)
  • Server incorrectly configured

You will want to monitor the messages file on the logging server to ensure the test log entry is received. On the logging server, run the following command:

tail -f /var/log/messages

Tail will open the log file read only and print the last five lines. As new log entries are received, they will get printed to the screen as well.

To generate a test log entry, I like to use Netcat. Netcat can be used from any Windows, Linux or UNIX system. From the test system, run the command:

echo ‘this is a UDP test log entry’ | nc -u -w 1 <IP address of logging server> 514

You should see the echoed portion of the command show up in the /var/log/messages file on the logging server. If not, launch a packet sniffer and see if you can determine where the failure is occurring. If you wish to test TCP connectivity as well, simply run the command again leaving out the Netcat “-u” switch:

echo ‘this is a TCP test log entry’ | nc -w 1 <IP address of logging server> 514

If both entries are received, we are ready to start pointing devices at our logging server.

Network hardware

Most network hardware supports Syslog via UDP/514. It is just a matter of going through the documentation and determining the proper command set for sending log entries to a remote server.

If you are using Cisco IOS, run the following from global configuration mode:

logging <IP address of logging server>

If you wish to change the logging facility from “local use 7” to something else, the command is:

logging facility <facility short name>

So to change logging facility to “local use 3”, the command would be:

logging facility local3

Linux and UNIX systems

There are a number of Syslog alternatives available for Linux and UNIX. In this section I’ll cover how to get Syslog and rsyslog to forward their log messages to a remote server.

Syslog

If the client is running Syslog, you will need to edit the /etc/syslog.conf file. Add the following line to the bottom of the file:

*.*                   @<IP address of logging server>

So an example would be:

*.*                   @192.168.1.150

Note the white space between the wild card match and the remote IP address MUST BE TAB CHARACTERS. If you use spaces, Sysylog will not be able to parse the file. Save and exit the file, then restart Syslog to activate the changes.

rsyslog

With rsyslog we have the option of forwarding our log messages via UDP or TCP. In either case we will need to edit the /etc/rsyslog.conf file. To forward long entries using UDP, add the following line to the end of the file:

*.* @<IP address of logging server>:<port>

So an example would be:

*.* @192.168.1.150:514

If we wish to use TCP instead, we simply use two “@” symbols:

*.* @@192.168.1.150:514

Once complete save and exit the file. You will need to restart rsyslog to activate your changes.

Windows systems

As mentioned in a previous post, Windows does not include support for Syslog. This means you will need third party software to convert your logs in real time and submit them to a logging server. The Loganalysis Web site has a list of possible solutions.

For the purposes of this post, I’ll cover Snare. It is free for use, can be commercially licensed, and have a very simple deployment process.

Once you download the software, you need to configure it for the system. This is shown in Figure #1. Snare needs to know the location of the logging server as well as what facility and severity level to use. Once complete, click the “Latest Events” menu potion to see which specific Event Viewer log entries Snare is forwarding to the logging server.

snare-config

Exec Summary

In this post I discussed testing connectivity to a logging server, as well as how to configure clients to centralize their logs. In the next post I’ll talk about what to look for in a daily reporting tool, as well as real time alerting.

Setting Up A Security Information Management System-Part4

August 12th, 2009

In the last post I talked about how to setup a logging server that will accept remote log entries. In this installment I’ll talk about how to sort log entries into specific files.

Facility, severity and priority

Let’s talk about how logging servers figure out which file to store a log entry in when it gets received. Log messages contain two descriptive parameters, facility and severity. When these two parameters are combined, the value is referred to as the priority of the log message.

Facility

Facility defines the type of process that generated the log entry. For example all mail servers are expected to identify that their log entries are part of the “mail” facility. FTP processes should use the FTP facility, NTP processes should use the NTP facility, and so on. RFC 3164 defines the valid facilities, but here’s the list:

Numerical          Facility

Code

0              kernel messages (kern)

1              user-level messages (user) – default if not specified

2              mail system (mail)

3              system daemons (daemon)

4              security/authorization messages (auth)

5              internal syslogd (syslog)

6              line printer subsystem (lpr)

7              network news subsystem (news)

8              UUCP subsystem (uucp)

9              clock daemon

10            security/authorization messages (authpriv)

11            FTP daemon (ftp)

12            NTP subsystem (ntp)

13            log audit

14            log alert

15            clock daemon (cron)

16            local use 0  (local0)

17            local use 1  (local1)

18            local use 2  (local2)

19            local use 3  (local3)

20            local use 4  (local4)

21            local use 5  (local5)

22            local use 6  (local6)

23            local use 7  (local7)

The “local use” facilities are similar to private addresses in the IP world. These facilities are not reserved, and are available for anyone to use as they see fit.

Facility problems

There are a couple of problems here. To start, where is the Web server facility? This list was generated back in 1987 before Web servers (or Gopher for that matter) existed. So some of the services we use today (VoIP, SQL, etc.) are missing. Also, some of the listed services typically go unused in a corporate environment. UUCP and Network News (NNTP) are excellent examples.

The lack of current services has caused many vendors to rely heavily of the local use facilities. This can cause potential conflicts when we get into sorting our log entries. For example Linux uses local use 7 to identify its boot time log entries. Apache also uses local use 7 for Web server errors. So down the road it may be difficult for us to sort Web errors and boot messages into different log files.

Another problem is that there is no verbose description about each of these facilities. This can make it a bit difficult for a programmer to identify which one to use. For example, let’s say we’ve written a program that authenticates a user for network access. Which facility should we use? Facility 4 and 10 seem the most likely, but their descriptions are identical. How do we choose? If our program runs as a background process should we actually choose facility 3 instead?

You get the idea. The list is not as clear-cut as it could be. It is not uncommon to see vendors use a different facility than you would expect. For example I’ve seen VPN vendors undecided as to the differences between facilities 4 and 10, so they simply send some percentage of log entries to each.

Severity

Severity defines the importance of the log entry. The same RFC 3164 defines the severity levels as:

Numerical         Severity

Code

0             Emergency: system is unusable (emerg)

1             Alert: action must be taken immediately (alert)

2             Critical: critical conditions (crit)

3             Error: error conditions (error)

4             Warning: warning conditions (warn)

5             Notice: normal but significant condition (notice)

6             Informational: informational messages (info)

7             Debug: debug-level messages (debug)

Luckily the severity levels are far less vague than the facility descriptions. This means they are much less confusing to work with. The higher numbered severity levels tend to be very verbose. This means saying you want to send debug level messages to your logging server could easily flood the network. Use the higher numbered severity levels with caution.

Priority

When a log entry gets transmitted to a log server, the first value contained within it is the priority of the message. The priority is the facility and severity values combined per the following math formula:

( Facility x 8 ) + Severity = Priority

So lets say our mail server needs to send a warning message. What would the priority be? The mail facility has a value of 2, while warnings have a severity of 4. So the math would be:

( 2 x 8 ) + 4 = 20

If a print server (facility 6) needed to send a log entry saying it is currently on fire (severity 0), the priority value in the message would be:

( 6 x 8 ) + 0 = 48

When a log entry gets transmitted, the priority value needs to be encapsulated in less than and greater than signs. So the priority value in the above mail server message would be “<20>” while the print server would use “<48>”. Again, this needs to be the first piece of information transmitted in the log message.

Sorting log entries

The priority value is used by logging servers to sort the incoming messages. For example if we wanted all mail messages to go to the same file, we would tell our logging server that all messages with a priority of 16 (2×8+0) through 23 (2×8+7) should go to the “maillog” file. Most logging servers (like rsyslog) will let you do this numerically or by using the short description names.

rsyslog.conf example

Here are two lines out of the rsyslog.conf file that ships with Fedora. Let’s talk about what they are actually doing:

authpriv.*                                                              /var/log/secure

*.info;mail.none;authpriv.none;cron.none                /var/log/messages

These lines define two of the rules for determining which log entries should go to which log files. The syntax for sorting is:

facility.severity

So the first line says all facility 10 (authpriv) log entries, regardless of severity (“*” is a wild card match) should be sent to the file /var/log/secure.

The second line is a bit more complex as it has multiple conditions separated by semi-colons. These conditions state:

  • *.info = All facilities, so long as the severity level is info
  • mail.none = No mail facility log entries, regardless of severity
  • authpriv.none = No authprive facility log entries, regardless of severity
  • cron.none = No cron facility log entries, , regardless of severity

Or, to translate this to English, the line says “Send all severity “info” messages to /var/log/messages, except those that contain a facility of “mail”, “authpriv” or “cron”.

So with these rules we can define any combination of facility and severity values and which log file we would like to direct it. When you first set this up, stick with the defaults. As you start collecting log entries you can tweak the rules as you see fit.

Bending the RFCs

In an ideal world, the RFCs would be a perfect fit for everyone’s needs. Unfortunately this is not always the case. A good example is the logging facilities. As mentioned we are missing facilitates for modern day services, while at the same time have facilitates that we will never use.  An obvious answer is to recycle the outdated facilities in order to support modern services.

For example, UUCP ( facility 8 ) is not even supported by modern operating systems. With this in mind, I like to use it as my Windows facility. That way I can sort all Windows log entries into their own file. For network hardware, I use the network news facility (facility 7). If you are unsure if a facility is currently in use, modify your logging server’s configuration file to send all log entries for that facility to a unique file:

ftp.*                                                                 /var/log/facility-test

If no entries arrive, you are in good shape. Just keep in mind that a legitimate service may use it at a later date. For example if three months from now someone sets up an FTP server, we may have problems if we are already using the FTP facility (facility 11). If you are unsure you can always stick with the local use facilities, as that is what they are intended for. Local use 0 and 7 seem to be the most heavily used, so avoid them when possible.

Other sorting options

While its not part of the RFC, some logging servers give you the ability to sort log entries based on patterns within the message. A good example is Syslog-NG. Syslog-NG will sort based on facility and severity, but you can also sort based on source IP, the application that generated the log entry, etc. This gives you far more flexible sorting options and it may be something to consider if facility/severity is not granular enough for your needs.

Exec Summary

In this installment we talked about how facility and severity is used to sort log entries. In my next post I’ll talk about how to get each of our systems to submit log messages to our centralized server.

Setting Up A Security Information Management System-Part3

August 11th, 2009

In the last post I covered some of the architecture concerns with rolling out a centralized security information system. In this post I’ll cover deploying a basic log server, and verifying that it is ready to accept log entries.

Selecting a logging server

The first thing we need to do is select a platform for our logging server. If we are simply setting up a test lab, Windows, UNIX or Linux will all make great choices. Choosing Windows might be helpful for a Windows administrator, as they will not have to cut the curve on a new operating system while attempting to test out logging. While Windows does not support Syslog out of the box, there are some excellent packages like Kiwi Syslog Server and WinSyslog that will add Syslog support. Both have evaluation versions and are relatively inexpensive to license.

If we are talking about setting up a production server however, we will want to stay away from Windows. Windows is notorious for having a horrible IP stack. If fact previous “patches” have crippled it even further in the interest of slowing worm propagation and increasing the speed of the GUI. While many of these limitations have been removed in 2008 server and Windows 7, IP performance is still sub-par when compared to a Linux or UNIX system deployed on identical hardware.

So that leaves Linux and UNIX as choices for a production system. Which to choose will depend on personal choice. Some like the stability of BSD while others like the flexibility of Linux. For the purpose of this document I’ll be working with a Fedora based Linux system. Installation and setup of the OS is relatively intuitive and straightforward.

Accepting remote logs

In order to accept log entries from remote systems, older versions of Fedora required you to initialize the Syslog daemon (syslogd) with the “-r” option. This was done by adding “-r” to the syslogd_options line of /etc/sysconfig/syslog file. Some versions of Linux still support legacy Syslog, and require you to add “-r” to the Syslog RC initialization file. Check the docs for your specific distribution.

New Fedora systems however support “Reliable Syslog” or rsyslog. Implementation is pretty similar to plain old Syslog, except rsyslog supports communications over TCP/514 as well as UDP/514. In the last post I described that running log entries over TCP can fix some of the reasons we loose log entries, but not all of them. If you want to play around with TCP support, go ahead and open both ports on the logging server.

To get rsyslog to accept remote log entries, we must edit the /etc/rsyslog.conf file. Towards the beginning of the file you should see the following:

# Provides UDP syslog reception

#$ModLoad imudp.so

#$UDPServerRun 514

# Provides TCP syslog reception

#$ModLoad imtcp.so

#$InputTCPServerRun 514

The “#” (pound) symbol at the beginning of the line tells the system not to process the rest of the line. We use this technique for commentary as well as “commenting out” commands we do not wish to have processed. By commenting out the ModLoad and port specification lines, we prevent rsyslog from opening a listening socket. The helps to keep the system in a more secure state.

Since we are setting up a centralized logging server, we will need to open those sockets to accept remote log entries. Modify the /etc/rsyslog.conf file to remove the appropriate pound symbols. The file should now look like this:

# Provides UDP syslog reception

$ModLoad imudp.so

$UDPServerRun 514

# Provides TCP syslog reception

$ModLoad imtcp.so

$InputTCPServerRun 514

If you know you will never use TCP, you can leave the last two lines commented out. Once complete save your changes and exit the file.

We now need to restart logging so our changes are implemented. This is done on Fedora by executing the following command:

service rsyslog restart

When you execute the command, you should see rsyslog stop and start with a status of “OK”. If the shutdown failed, it is because rsyslog is not being initialized at boot time. From the command line, execute the command “setup” and select “System services” from the main menu. When the services menu appears, scroll down the list till you find rsyslog. Check off the box to the left and then select “OK”. Quit the setup utility and rsyslog will now initialize whenever the system is booted.

Verifying the listening port

Next we need to ensure that our logging process is accepting remote log entries. From the command line, type “netstat -an | grep :514”. The output should look similar to the following:

[root@fubar ~]# netstat -an | grep :514

tcp     0      0 0.0.0.0:514                 0.0.0.0:*              LISTEN

tcp     0      0 :::514                                 :::*              LISTEN

udp    0      0 0.0.0.0:514                 0.0.0.0:*

udp    0      0 :::514                                 :::*

The first line tells us that TCP/514 is listening via IPv4 on all network interfaces. Line two tells us the TCP port is also listening on any interface with an Ipv6 address. Lines three and four are the same information, except for UDP. If any of the entries state “127.0.0.1:514” instead of “0.0.0.0:514”, then the port is only bound to the loopback interface. Only the local system will be able to reach it. This can happen with legacy Syslog systems if you forgot to run them with the “-r” switch.

You should now have a logging server that is capable of receiving inbound log entries. In the next post I’ll talk about how these log entries get sorted into specific files.

Setting Up A Security Information Management System-Part2

August 10th, 2009

In my last post we discussed defining your goals for a Security Information Management (SIM) system. In this post we’ll talk about architecture concerns as well as capacity planning.

Network communications

The goal will be to have one or more SIM servers that will collect log entries from other systems. This will obviously have an impact on network utilization. How much of an impact will depend on the quantity and type of systems we collect log entries from.

UDP/514

Just about all systems support the original Syslog communication implementation which goes all the way back to the year 1988. The last description of this spec appeared in RFC 3164. While this RFC has been obsoleted by RFC 5424, RFC 3164 still represents the implementation supported by most vendors. Windows is a notable exception (proprietary, no Syslog support), but there is 3rd party software to rectify this.

Both RFCs specify the use of the UDP protocol when transmitting log entries. The well-known port to use is UDP/514. Where RFC 3164 and 5424 differ is in the format of the log message. I’ll dig into these differences in a later post.

The love/hate of using UDP

On the positive side, UDP is connectionless. This means that it generates less traffic than if we used TCP. Also, log transmissions are a one-way process. The host generating a log entry sends a packet to the logging server, but the logging server never replies. This means we can control traffic flow with static filtering rather than stateful filtering which will place less overhead on the traffic control device. Also, the UDP header is typically 1/3 – 1/4 the size of a TCP header, which means smaller transmission packets, thus less network overhead.

On the negative side, UDP is connectionless. ;) This means that it has minimal error reporting capability. For example if we transmit a log entry and the frame goes missing (say a collision or a firewall dropping the packet), UDP does not have the ability to detect that a retransmission is required. This means its possible for log entries to go missing if we overflow the network. Further, UDP has no flow control ability. If the SIM server recognizes it is reaching capacity it has no way to slow down the incoming transmission of log entries. The SIM server’s only option is to throw the packets away without processing them.

Needless to say, we need to ensure that we properly specify capacity. If the network or the SIM server becomes overloaded, we are going to lose log entries. Proper capacity planning starts with understanding the impact of logging on the network.

Network impact of logging

The maximum size of a UDP Syslog packet has different specifications in different RFC’s. The outdated RFC 3164 defines the maximum message size as being 1,024 bytes. RFC 5426 drops this maximum size to 480 bytes. If a vendor is still following the old spec, its possible they may still think the 1,024 byte size is legitimate. It has been my experience however that most log entry packets range in size from 75 to 225 bytes, so the maximums are a non-issue.

Windows systems, firewalls and intrusion detection systems tend to generate the largest messages. Network hardware tends to generate the smallest messages. If we have a 100 Mb Ethernet network, the theoretical maximum would be somewhere around 50,000 to 130,000 frames per second. This assumes zero other traffic, which is rarely the case. For the purposes of capacity planning, assume you will be limited to 5,000 log entries per second. This number might even be less if you have a busy network. Taking some utilization measurements during the planning process is key.

Syslog over TCP

As mentioned above, UDP introduces the problem that log entries can become lost without us even knowing it. There are ways to validate capacity, which I will cover in a later post. Some feel running Syslog over TCP can rectify this problem. TCP can be leveraged for its reliability to insurie our log entries are properly received.

Unfortunately TCP support for Syslog is no were near standardized. Some vendors support TCP by simply listening on TCP/514. RFC 3195 defines Reliable Syslog as using port TCP/601, but its adoption has been extremely limited. RFC 5425 defines the use of TLS to secure Syslog transmission. This RFC specifies the use of port TCP/6514. This is a brand new specification and I’m unaware of anyone supporting it just yet.

So support for TCP is all over the board. Further, TCP does not completely fix the problem. While TCP will give us flow control and reliability on the wire, it cannot make up for the fact that Syslog at the application layer does not acknowledge the receipt of log entries. This was by design as it reduces overhead. The problem is that even by using TCP we can still lose messages within the IP stack and never know it is occurring.

So if you want to try and transmit logs via TCP, its only going to work between a specific vendor’s client and server software. For example you may need to run Syslog-NG on both ends of the connection to leverage it’s support for TCP. This is not always practical, as you cannot run the software client on appliances like access points, switches, routers, etc.

Where to place the logging server

When deciding where to place the logging server, we have to keep both network capacity and security in mind. Take a look at Figure #1. This is an ideal situation where the logging server has been isolated to a dedicated network operations network. This isolates it from the other security zones and makes it much easier to leverage the firewall to restrict access to the logging server.

sim-placement

The drawing assumes we only need one logging server for our entire environment. What if we have 100,000 nodes to keep track of? Large networks may need to look at aggregating the data. For example if I have 10 field offices, I may need to have a logging server located at each of them collecting local log info. Each of these logging servers would then relay summary information back to the corporate office for network wide trend reports. This way we maintain a high level of visibility while reducing network load. I’ll cover some possible aggregation options in a later post.

How many systems can log to a single logging server?

There is no single answer to this question as each network is different. It is going to depend on how much capacity is available on your network and how many log entries each of these systems generate. For example I could probably point 50,000 switches at a single logging server, as switches tend to generate very few messages. Firewalls on the other hand are extremely chatty, so I might max out the network or SIM server with only 20-50 firewalls. So to answer the question we need to look at two metrics:

  • How much free capacity is there on the wire?
  • How many log entries will each host generate?

The second question is not as straightforward as it may seem. For example the average desktop may only generate 40-100 log entries per day. If we can push 1,000 log entries per second, the math says we should be able to point 86 million desktops at a single logging server. The problem is about 80% of those messages are generated at initial boot time. If everyone typically powers around 9:00 AM, the math changes to a more realistic 750 desktops (again, assuming we can push 1,000 log entries per second over the wire).

So we can’t just look at quantity of long entries. We need to take time of day into account as well. This will identify the actual number of log entries per second we can expect under worse case conditions. Worse case is the capacity level we need to plan for.

Deploy centralized logging in phases

If you have tens of thousands of systems to deal with, it is easy to get overwhelmed with the work involved with deploying centralized logging. Rolling out the solution in phases makes it easier to wrap your brain around the whole process.

First, start with a single logging server. You may not be able to cover your whole network, but we have to start somewhere. Large networks should consider a deployment at the corporate office first, moving out to field offices once the corporate system is fully vetted and functional.

You will also want to phase in which devices you are collecting information from. I usually go with the following order:

  1. Network intrusion detection systems
  2. Firewalls
  3. Network hardware (routers, switches, access points, print servers, etc.)
  4. Internet facing servers
  5. Internal servers
  6. Internal desktops

Obviously you can tweak this list to fit your needs. For example if you do not plan on collecting info from desktops, simply leave that step out. I like to start with network intrusion systems first as their log entries are well suited for vetting both daily reports and real time alerting. Once we have a handle on alerting and reporting, adding additional devices becomes far easier.

Exec Summary

In this post I covered all the things you need to consider when initially deploying a centralized logging solution. We covered how to predict the impact it will have on network utilization, how to calculate the number of hosts per logging server, and why it is important to deploy the solution in phases. In the next post we’ll start talking about configuring the centralized logging server. Specifically, we’ll look at how we are going to sort log entries.

Setting Up A Security Information Management (SIM) System – Part 1

August 8th, 2009

I get a lot of logging related questions. So much so that I decided to do a series on how to deploy log management. There are some excellent logging resources on the Internet, but they are fragmented in scope and/or vendor specific (usually written by the vendors). I wanted to create something vendor neutral that holds your hand through the entire process of deploying a log management solution.

Why should I deploy a security information management system?

Let’s be candid, deploying log management is hard and painful. This is the reason why so many administrators avoid it like the plague. It is difficult to deploy and a wild buck for performing long term administration. Weekly trips to the dentist would probably be more pleasurable.

With all that said, log management is probably the single most effective security solution you can deploy. You can’t drop it and forget it like a firewall, but log management can give you unrivaled visibility into the inner workings of your network. When its not providing insight into security events you might otherwise miss, it is doing double duty helping you troubleshoot communication and system issues. A logging system can be resource intensive, but it can also provide a very high rate of return.

Why do you want a SIM?

Before we begin, the first question you have to ask yourself is why do you want a SIM solution. Do you want to improve security or is there a compliance specification you need to adhere to? It might seem odd to want to distinguish between the two, but the requirements are drastically different. Standards are far easier (and cheaper) to meet than true security.

Standards such as PCI-DSS require you to log user, application and network activity. However they tend to be very vague in how that information gets processed. You can usually get away with dropping in a black box, generating some colorful management reports, and be considered “compliant”. It may not help you find that backdoored system that’s calling home, but you’ve met the standard.

Standards tend to focus on the lowest common denominator. They need to be applicable for a wide range of audiences, including businesses without a lot of resources. Rather than evaluating a specific organization’s risk and basing the requirements on that, we set the bar low so it is achievable by small and large organizations alike.

Also, to simplify the process, we tend to focus on checklists. Checklists are cool because they tell you exactly what needs to be done to be complaint. If an auditor can put a checkmark next to all the items, you pass the testing. The problem is checklists tend to focus on symptoms, not the actual problem.

I’ll give you a great example. I had a client bring in a Qualified Security Assessor to certify them for PCI-DSS. This was one of my clients running a strict implementation of application control, so they could show a year and a half history of zero Malware infections. While they certainly received Malware over that time, we could prove that there were zero instances of actual infection as every Malware attack was immediately contained and eliminated. Not many businesses can claim a year+ with zero Malware infections.

The auditor failed them. PCI-DSS requirement #5 states: “anti-virus software must be used on all systems commonly effected by Malware”. Since they ran application control, not anti-virus, they were deemed non-compliant. If requirement 5 had been written to identify an acceptable threshold for Malware containment, they certainly would have met the specification. However risk evaluation and metrics do not make for easy checklist items.

So if you want to deploy a SIM to actually augment security in your environment, it is going to take longer and require more work than simply meeting a specification.

Should you build your own SIM?

I’m a firm believer that anyone considering a SIM solution should start by building his or her own. While there are some decent commercial SIM solutions out there, they isolate you from the inner works of the logging process. This can be a good thing in that it saves you time. The problem is you will not learn as much.

Also, log management deployment is a journey. You will find in the course of a rollout that your requirements may change. Information you initially thought was important, all of a sudden is not. Reports you didn’t even think of, all of a sudden jump to the top of the list. By building your own system you will have more flexibility to make changes on the fly. If you later decide you want a commercial solution, you are now better informed of your requirements and can do a better job evaluating a potential purchase. This is important, as many log solutions are expensive. You don’t want to drop a lot of money on a solution that will not meet your long-term needs.

I’ll give you a good example. Most of the sites I’ve worked with initially think failed logons are important and want to see the reports. It does not take them long to figure out seeing all failed logons is a complete waste of time as everyone fat fingers the keyboard on occasion. They then realize they want some thresholds around the data. For example they only want to see failed logons if three or more failures are seen in five seconds (indicating an automated attack). Or only show failed logons when multiple logon names are used from the same source IP (indicating a password guessing attack). So by dealing with some information overload, they become better skilled at defining exactly what they wish to see.

Summary

OK, so we’ve covered defining a focus (security Vs. standards requirement) as well as the importance of initially building your own system. In the next installment I’ll get into architecture and capacity planning.