In my last post we discussed defining your goals for a Security Information Management (SIM) system. In this post we’ll talk about architecture concerns as well as capacity planning.
Network communications
The goal will be to have one or more SIM servers that will collect log entries from other systems. This will obviously have an impact on network utilization. How much of an impact will depend on the quantity and type of systems we collect log entries from.
UDP/514
Just about all systems support the original Syslog communication implementation which goes all the way back to the year 1988. The last description of this spec appeared in RFC 3164. While this RFC has been obsoleted by RFC 5424, RFC 3164 still represents the implementation supported by most vendors. Windows is a notable exception (proprietary, no Syslog support), but there is 3rd party software to rectify this.
Both RFCs specify the use of the UDP protocol when transmitting log entries. The well-known port to use is UDP/514. Where RFC 3164 and 5424 differ is in the format of the log message. I’ll dig into these differences in a later post.
The love/hate of using UDP
On the positive side, UDP is connectionless. This means that it generates less traffic than if we used TCP. Also, log transmissions are a one-way process. The host generating a log entry sends a packet to the logging server, but the logging server never replies. This means we can control traffic flow with static filtering rather than stateful filtering which will place less overhead on the traffic control device. Also, the UDP header is typically 1/3 – 1/4 the size of a TCP header, which means smaller transmission packets, thus less network overhead.
On the negative side, UDP is connectionless.
This means that it has minimal error reporting capability. For example if we transmit a log entry and the frame goes missing (say a collision or a firewall dropping the packet), UDP does not have the ability to detect that a retransmission is required. This means its possible for log entries to go missing if we overflow the network. Further, UDP has no flow control ability. If the SIM server recognizes it is reaching capacity it has no way to slow down the incoming transmission of log entries. The SIM server’s only option is to throw the packets away without processing them.
Needless to say, we need to ensure that we properly specify capacity. If the network or the SIM server becomes overloaded, we are going to lose log entries. Proper capacity planning starts with understanding the impact of logging on the network.
Network impact of logging
The maximum size of a UDP Syslog packet has different specifications in different RFC’s. The outdated RFC 3164 defines the maximum message size as being 1,024 bytes. RFC 5426 drops this maximum size to 480 bytes. If a vendor is still following the old spec, its possible they may still think the 1,024 byte size is legitimate. It has been my experience however that most log entry packets range in size from 75 to 225 bytes, so the maximums are a non-issue.
Windows systems, firewalls and intrusion detection systems tend to generate the largest messages. Network hardware tends to generate the smallest messages. If we have a 100 Mb Ethernet network, the theoretical maximum would be somewhere around 50,000 to 130,000 frames per second. This assumes zero other traffic, which is rarely the case. For the purposes of capacity planning, assume you will be limited to 5,000 log entries per second. This number might even be less if you have a busy network. Taking some utilization measurements during the planning process is key.
Syslog over TCP
As mentioned above, UDP introduces the problem that log entries can become lost without us even knowing it. There are ways to validate capacity, which I will cover in a later post. Some feel running Syslog over TCP can rectify this problem. TCP can be leveraged for its reliability to insurie our log entries are properly received.
Unfortunately TCP support for Syslog is no were near standardized. Some vendors support TCP by simply listening on TCP/514. RFC 3195 defines Reliable Syslog as using port TCP/601, but its adoption has been extremely limited. RFC 5425 defines the use of TLS to secure Syslog transmission. This RFC specifies the use of port TCP/6514. This is a brand new specification and I’m unaware of anyone supporting it just yet.
So support for TCP is all over the board. Further, TCP does not completely fix the problem. While TCP will give us flow control and reliability on the wire, it cannot make up for the fact that Syslog at the application layer does not acknowledge the receipt of log entries. This was by design as it reduces overhead. The problem is that even by using TCP we can still lose messages within the IP stack and never know it is occurring.
So if you want to try and transmit logs via TCP, its only going to work between a specific vendor’s client and server software. For example you may need to run Syslog-NG on both ends of the connection to leverage it’s support for TCP. This is not always practical, as you cannot run the software client on appliances like access points, switches, routers, etc.
Where to place the logging server
When deciding where to place the logging server, we have to keep both network capacity and security in mind. Take a look at Figure #1. This is an ideal situation where the logging server has been isolated to a dedicated network operations network. This isolates it from the other security zones and makes it much easier to leverage the firewall to restrict access to the logging server.
The drawing assumes we only need one logging server for our entire environment. What if we have 100,000 nodes to keep track of? Large networks may need to look at aggregating the data. For example if I have 10 field offices, I may need to have a logging server located at each of them collecting local log info. Each of these logging servers would then relay summary information back to the corporate office for network wide trend reports. This way we maintain a high level of visibility while reducing network load. I’ll cover some possible aggregation options in a later post.
How many systems can log to a single logging server?
There is no single answer to this question as each network is different. It is going to depend on how much capacity is available on your network and how many log entries each of these systems generate. For example I could probably point 50,000 switches at a single logging server, as switches tend to generate very few messages. Firewalls on the other hand are extremely chatty, so I might max out the network or SIM server with only 20-50 firewalls. So to answer the question we need to look at two metrics:
- How much free capacity is there on the wire?
- How many log entries will each host generate?
The second question is not as straightforward as it may seem. For example the average desktop may only generate 40-100 log entries per day. If we can push 1,000 log entries per second, the math says we should be able to point 86 million desktops at a single logging server. The problem is about 80% of those messages are generated at initial boot time. If everyone typically powers around 9:00 AM, the math changes to a more realistic 750 desktops (again, assuming we can push 1,000 log entries per second over the wire).
So we can’t just look at quantity of long entries. We need to take time of day into account as well. This will identify the actual number of log entries per second we can expect under worse case conditions. Worse case is the capacity level we need to plan for.
Deploy centralized logging in phases
If you have tens of thousands of systems to deal with, it is easy to get overwhelmed with the work involved with deploying centralized logging. Rolling out the solution in phases makes it easier to wrap your brain around the whole process.
First, start with a single logging server. You may not be able to cover your whole network, but we have to start somewhere. Large networks should consider a deployment at the corporate office first, moving out to field offices once the corporate system is fully vetted and functional.
You will also want to phase in which devices you are collecting information from. I usually go with the following order:
- Network intrusion detection systems
- Firewalls
- Network hardware (routers, switches, access points, print servers, etc.)
- Internet facing servers
- Internal servers
- Internal desktops
Obviously you can tweak this list to fit your needs. For example if you do not plan on collecting info from desktops, simply leave that step out. I like to start with network intrusion systems first as their log entries are well suited for vetting both daily reports and real time alerting. Once we have a handle on alerting and reporting, adding additional devices becomes far easier.
Exec Summary
In this post I covered all the things you need to consider when initially deploying a centralized logging solution. We covered how to predict the impact it will have on network utilization, how to calculate the number of hosts per logging server, and why it is important to deploy the solution in phases. In the next post we’ll start talking about configuring the centralized logging server. Specifically, we’ll look at how we are going to sort log entries.
Related posts:


