Richard’s post over here talks about the “Intruder’s Dilemma”:
The defender only needs to detect one of the indicators of the intruder’s presence in order to initiate incident response within the enterprise.
I agree wholeheartedly. However, the wonderful thing about Indicators is that there can be lots and lots of them, and interpreting them can be tricky. Sometimes it’s easy to misinterpret what they’re telling you, and sometimes it’s tempting to focus 100% on the first or most prominent Indicator observed, ignoring (or not looking for) other ones that will add context and detail to the overall situation.
Take for example the Indicator in the picture below. What are we looking at?
- An enclosure at a wildlife park, or
- Some kind of strip club?
- For some reason, I had chosen to focus on my first interpretation of the Indicator, and
- I had failed to take into account any unfavourable counter-Indicators that contradicted my assessment (the most prominent of which was being at a zoo at the time)
A while back, we had a problem with one of our systems – ProcessA wasn’t able to talk to ProcessB (the “security” slant here is that we’re dealing with a compromise of the “availability” facet of the CIA Triad). The system wasn’t working, and ProcessA had left loads of Indicators in the form of log entries which said “I tried to talk to ProcessB on port 1234 but I couldn’t”. ProcessA was clearly working at this point (otherwise there would have been no log entries), so the initial interpretation of the Indicator was that ProcessB was in some way at fault.
Cue the usual troubleshooting process:
- Is ProcessB running? Yes.
- Is ProcessB accepting requests right now? Yes, it seems to be working. The error messages from ProcessA are intermittent.
- Was ProcessB running at the time ProcessA logged the messages? Yes. ProcessB logs startup/shutdown messages, and there aren’t any for this particular timeframe.
- Was there some kind of network problem? Unlikely, since ProcessA and ProcessB are running on the same server.
- Are you sure it wasn’t the network? Given the above, the network really can’t have much of a hand in this!
- But ProcessB listens on IP address 184.108.40.206, which is bound to a physical interface. If that interface went down, wouldn’t ProcessB’s ability to listen be affected? Yes, but logs from the switch that the server connects to don’t show any up/down events for the interface concerned.
And so it went on, for several hours. ProcessB was deemed to be at fault, yet we couldn’t find anything wrong with it. The troubleshooting had become bogged down because:
- We were proceeding under the assumption that ProcessB was broken (I mean, duh!! ProcessA wouldn’t have left all those log messages if ProcessB were working, would it?!?)
- We were ignoring a gigantic counter-Indicator, namely “there’s nothing wrong with ProcessB”.
It generally takes two to tango, something that applies as much to TCP comms as anything else. If the server’s not at fault, perhaps it’s the client, ProcessA?
As it turns out, ProcessA was indeed the one with the problem. An errant service, ProcessZ, had been leaving thousands of sockets in the CLOSE_WAIT state (this almost always indicates a problem with the software, rather than an expected TCP phenomenon). Eventually, it got to the point where the tango-ing ProcessA needed a client socket to talk to ProcessB, but there weren’t any ephemeral ports available because ProcessZ had, over time, left them all in CLOSE_WAIT (Windows Server 2003 only allows for 5000 ephemeral ports by default). Lacking a client socket with which to talk to ProcessB, ProcessA therefore duly logs a message to say that “I tried to talk to ProcessB on port 1234 but I couldn’t”…
Restarting ProcessZ was a temporary band-aid – all the sockets in CLOSE_WAIT went away, and ProcessA danced the night away with ProcessB.
The moral of the story? Don’t take isolated Indicators at face value. There will almost always be other Indicators that either back up or refute your assessment – all you have to do is look for them.
The other moral of the story? Very few zoos have strip clubs 🙂
Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk