Collection is King, part two

In part one of this post, I talked about NSM as a methodology, and the four different kinds of information that NSM collects (alert data, session data, statistical data and full-content data). All of this lot is “scraped off the wire” by some means or another and fed to a sensor for collection.

NSM is a great discipline to practice, and provides the analyst with triggers for an investigation (in the form of alerts, or anomalies in the statistical or session data) and also the means to resolve the puzzle in a timely fashion.

However, a network of routers, switches, firewalls, IDS devices, servers, workstations and applications has a lot more information to offer than just what one can scrape off the wire, namely their logs:

Routers, switches, firewalls and IDS will clearly have a nice subset of security-related log outputs (“I dropped a packet because an ACL told me to”, or “There are too many half-open TCP sessions”, or “I blocked access to this URL”, etc), and also many many others describing various changes in state. Cisco’s list of log outputs is here, and it’s really long.
Servers and workstations will also have their own logs. In the case of Windows, it’s the Event Log – EventID.net documents over 10,000 different types of messages.
Individual applications will also maintain logs. Things like webservers, FTP servers, print servers, anti-virus servers, disk encryption management servers, etc. all create volumes of log entries.
Most devices contain some kind of management agent, usually SNMP or WMI based. You can poll for information like CPU/network/memory load and draw pretty graphs.

Most of the things in these logs will denote utter trivia. Knowing that favicon.ico was downloaded from your company’s website three thousand times in the last month isn’t very exciting. And it’s no surprise that Old Bob in accounts fluffed his login attempt yet again when he fired up his computer this morning. But each of these nuggets may play a part in an investigation, either as the initial trigger, or perhaps in a supporting role, providing context to data collected by an NSM sensor:

As a trigger. If Old Bob fluffs his login five hundred times inside of ten minutes, then something is definitely afoot. Yes, you may be able to get a whiff of a problem from your NSM sensors, but it’s certainly more convenient (and clearer to understand) if you get five hundred “failed login” messages from a server’s log.
Providing context. Let’s say our NSM kit tells us that something is fishy with a desktop machine at 10.11.12.13. NSM will tell us exactly what this IP address has been up to network-wise, but if we’re collecting other stuff we can find out more:
- Assuming that 10.11.12.13 is a “local” Windows box and under our control, we can ask the domain controllers which user account was in use at the time of the incident, and which machine account was used by 10.11.12.13.
- We can look in the DHCP server logs to discover the hardware MAC address that was associated with it at the time of the incident (if 10.11.12.13 isn’t covered by full-content capture, this is the easiest way to get this information).
- We can look for this hardware address in the MAC tables that we’ve been pulling from our switches via SNMP, and find out which port on which switch the machine was attached to.
- We can see if any policy enforcement devices like firewalls have taken action against 10.11.12.13.
Now we can begin to ask questions about why Dave from Marketing is apparently logged into the CTO’s laptop, which is bizarrely connected up in the broom cupboard on the sixth floor and is mercilessly scanning for anything that will listen on port 31337….

The above are just some simple examples – the point is that log collection and NSM are complementary to each other and if you can do them both you’re better off for it.

So, how do you go about collecting all this stuff? I’m not going to recommend any specific magic products, but an ideal log collecting system will be able to:

Receive log entries via a mechanism like syslog.
Reach out and take the log entries from monitored devices that aren’t able to “push” this information to the collector. By doing this, you reduce the likelihood that the Bad Guys take over your server and erase the pertinent log entries to cover their tracks.
Represent log entries in an abstract form to cater for messages from different vendors. An “I dropped a packet” message from a Cisco firewall will be different to one from a CheckPoint firewall, but they both mean the same thing. The log collector should present both of these to you as the same event type.
Sessionise events. Based upon IP addresses, ports, and the time of day, the collector should be smart enough to be able to correlate different events together and say that they were all part of the same action.
Allow the user to query all of the collected information with flexible filtering criteria. You should be able to ask questions like:
- What did 10.11.12.13 do between 10:00 and 10:23 on Tuesday? I want to see domain logins, denied packets, URLs requested, etc. etc.
- How many failed logins were there last week for members of the “Domain Administrators” group?
- Which of my access points have had failed associations in the last 24 hours? I want to see a count of failures per-device, and I want to see a graph of failures over time.
- Use your imagination. What do you care about on your network?

Such a device will probably also offer an “Automagical Relevant Security Evaluation” facility (“ARSE”), which can take all of these abstracted, sessionised events and correlate them together and tell you Meaningful Security Stuff (although this could well be marketing flimflam on the part of the vendor). I have only limited experience of these (very expensive) toys, but the greatest value comes from the data mining I do myself in full knowledge of how my infrastructure operates. The ARSE facility looks great during customer demos and sounds great on paper, but you can’t beat actual analysis performed by an actual analyst. I prefer to view such devices as a “Google” for my big pile o’ logs.

So why do we need to go to all these lengths to get this information? Can’t NSM deliver the goods by itself? Well, yes and no:

It’s much simpler to see a log message saying “Bob logged in” than it is to unpick this information from an SMB traffic stream. Having both forms of information is actually quite beneficial – they can back each other up, or they can contradict each other. NSM saying “Bob logged in” but the server saying “nobody logged in today” is an indicator in itself.
You might not be able to get enough information from a full content capture. For example, an ssh login won’t be decipherable from a pcap (unless you can supply the keying material), but the logs on the server will tell you who logged in.
You might not have enough NSM sensors to cover everywhere. Take a typical office with five workstations, a Windows domain controller, and an NSM sensor at the border with the Internet. The NSM sensor isn’t going to tell you anything about denied logins as policed by the domain controller.

The message here is that visibility is everything. Anything you can do to improve the visibility you have into the operation of your infrastructure can only be of benefit. It is not important that most of what you collect will be run-of-the-mill trivia, when each item is taken on its own. What is important is that you’ve collected a vast amount of easily-accessible jigsaw pieces, each put into the context of each other, and you’ll be really glad you’ve got it when the time comes!

Comments?

Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

This entry was posted on 1 July, 2009 at 22:25 and is filed under NSM. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

wirewatcher