Archive for the net-entropy Category

glTail parsers for Snort, net-entropy and viewssld

Posted in net-entropy, NSM on 28 October, 2011 by Alec Waters

glTail is a tool for realtime log visualisation, which according to the website allows you to “view real-time data and statistics from any logfile on any server with SSH, in an intuitive and entertaining way.”

glTail can read from any text logfile you like, and via a set of parsers can extract information such as IP addresses for graphical display. Each row from the logfile may trigger several blobs, e.g. source IP, dest IP, etc, as you can see in the video below:

I’ve written some parsers for Snort, net-entropy and viewssld. A screenshot of them all in action is shown below (click for full size view):

The red blobs are related to Snort, cyan ones to net-entropy, and the yellow shades are from viewssld. The numeric columns show the rate at which each item is appearing, and the length of the coloured highlight bars show the proportion of occurences of a given item relative to the others.

The parser files and a sample config.yaml file that uses them can be found here (snort.rb, net-entropy.rb, viewssld.rb and config.yaml).

Useful?

So, it’s a pretty visualisation of interesting stuff, but is it useful and actionable? It’s certainly hopeless for correlation – when a signature fires, it’s more or less impossible to tell the associated IP addresses and ports even if you have a very quiet sensor. At the other end of the scale, if you’re inundated with blobs you can alter the regexes in snort.rb to match on a specific IP/protocol/signature etc to be a little more selective.

Where I think this may prove most useful is when you’re learning from an incident. If you’ve investigated an incident where someone compromised your webserver, you could pull all the relevant log entries that show:

  • Snort alerts (when the attacker was probing for vulnerabilities)
  • Apache/IIS log entries (showing everything else they did to your server)
  • net-entropy logs (showing the attacker’s outbound backdoor SSH tunnel).

If you were to pump all of these logs through gltail you’d have an effective visualisation of the attack. For inspiration, check this out:


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters@dataline.co.uk

Making sense of the results of the net-entropy experiment

Posted in Crazy Plans, net-entropy, NSM on 19 January, 2010 by Alec Waters

The rest of the net-entropy saga is listed here, but to briefly recap, net-entropy is a tool that can be used to profile the level of “randomness” of network traffic. High entropy traffic like this is usually either encrypted or compressed in nature, and I’m definitely interested in knowing about the presence of the former on my network. I’ve been using net-entropy as a general purpose “randomness detector”, something that the author didn’t really have in mind when he wrote it.

However, drawing meaningful conclusions from the results gathered can be tricky (this is nothing to do with net-entropy itself, and everything to do with the way I’m using it backwards to generically detect randomness). Some of the observed high entropy traffic will definitely fall into the “encrypted” category, like SSH and Remote Desktop. Other stuff will definitely be of the “compressed” flavour, like RTMP on port 1935.

Once this kind of low-hanging fruit has been pruned, the analyst is left with a whole load of unknown high-entropy traffic. If net-entropy’s presence is to be of any value at all when deployed this way, we have to try to make sense of the residual data in some way or another.

One tactic is to fire up Sguil and use its full-content capture to extract and examine each of the unknown high-entropy streams in turn. This is massively labour intensive, but some of the time you’ll find something interesting like HTTPS on a non-standard port (the certificate exchange at the start of the conversation is in clear text, giving you a clue). Most of the time however, you’re left looking at unintelligible garbage. Unsurprising really, given that it’s likely to be either compressed or encrypted…

Given that protocols like SSH, RDP and RTMP can most usually be identified by their port numbers alone and what’s left is unreadable junk, how are we to derive value from these other indicators from net-entropy? I can think of a couple of ways:

  • Session contextualisation
  • Session profiling on the basis of session duration and frequency, etc

Putting a high-entropy session into context isn’t too labour intensive, and sometimes pays dividends. Let’s say net-entropy has spotted a session from 1.2.3.4:25333 to 4.3.2.1:3377; the full-content capture is unreadable garbage, and the port numbers offer no clues. If we ask the question “was there any other traffic involving 1.2.3.4 and 4.3.2.1 at the same time”, we might get a hint. In this instance, there was a connection from 1.2.3.4:16060 to 4.3.2.1:21, which looks like an FTP command session judging by the port number. When we examine the full-content capture for this session, we can see passive FTP at work:

227 Entering Passive Mode (4,3,2,1,13,49).

The last two numbers denote the port that the client should connect to to transfer a file. Doing the maths, we see that (13 * 256) + 49 = 3377, so we can be pretty confident that our high-entropy traffic in this case was a file transferred over FTP.

If context can’t be established however, all is not lost – we can look at other attributes of the traffic.

A lot of the high-entropy traffic that we see is bound for random ports on IP addresses all over the world, and most of it is related to peer-to-peer apps. In the case of Skype, high-entropy TCP traffic is usually signaling traffic to SuperNodes. Traffic to a given SuperNode  will occur little-and-often for a long period of time until one of the two endpoints goes offline, so net-entropy will be sending you alerts all day long for that specific destination. However, you certainly can’t say for sure that traffic matching this profile is definitely Skype (it could be a keylogger phoning home at regular intervals, for example). As such, examination of the little-and-often class of high-entropy flow doesn’t usually yield any definitive conclusion.

What is definitely interesting is where you have many high-entropy flows to the same destination address and port in a short period of time. We have detected the following by taking this “clustering” approach:

  • HTTP on a non-standard port, serving up images (most image formats are compressed, and thereby have a high-entropy signature). As an example, some of the images in search results here come from port 7777. Someone browsing this site will trigger many indicators from net-entropy in a short space of time that refer to this port.
  • HTTP proxies. Again, the high-entropy traffic is most commonly as a result of images being transferred.
  • SSL/TLSv1 on port 8083, which turned out to be traffic to a SecureSphere web application firewall.

Clusters like this are most easily detected by eye by means of visualisation. The following image is from one of my Cisco MARS reports, and shows a cluster of traffic to port 8083 in orange:

Something “worth looking at” will usually be quite clear from an image like this.

The approach I’ve taken with net-entropy has yielded neither precise nor definitive results (which certainly isn’t a complaint about net-entropy itself – it was never designed to be used the way I’m using it). But, I’ve discovered things that I’d never have known about without it, so I reckon I’ve come out on top!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

net-entropy Sguil agent and wiki

Posted in net-entropy, NSM, Sguil on 6 October, 2009 by Alec Waters

The story so far:

I’ve written a basic Sguil agent that will upload net-entropy’s RISING ALARM messages into Sguil. You can download the agent here, and the config file here.

On a Sguil sensor that has net-entropy installed, copy the agent to wherever your other agents live (/usr/local/bin on my system), and the config file to where your other config files live (/etc/nsm/sensor1/ on my system). Then fire it up:

net-entropy_agent.tcl
   -c /etc/nsm/sensor1/net-entropy_agent.conf

With a bit of luck, you'll see the agent register in the Sguil client:

net-entropy sguilAnd we'll start to see net-entropy messages appear, too:

net-entropy sguil eventsThe bottom right pane of the Sguil client will behave as it does for the PADS agent, and will show you the event detail:

net-entropy sguil detailSguil will correlate these events in the usual fashion, and allow you to right-click and say "Transcript" or "Wireshark". It all seems to work pretty well!

Finally, the net-entropy project has a new wiki - it's here. This is the place to go for the latest source code, which now includes a Paninski entropy estimator in addition to the original Shannon estimator. Have fun!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Detecting encrypted traffic with net-entropy, part two

Posted in Crazy Plans, net-entropy, NSM on 24 September, 2009 by Alec Waters

Back here I described my setup of a modified version of net-entropy, which I was going to use in my quest to detect encrypted traffic. Well, it’s been running for a week or so now, and I’ve got some results.

  • Did it detect encrypted traffic? Yes!
  • Did it detect stuff that wasn’t encrypted? Yes!
  • Did it fail to detect traffic that definitely was encrypted? Yes!
  • Was the experiment a total failure and a complete waste of time? No! Far from it, in fact.

The theory I was testing was that traffic with sufficient entropy might be encrypted, since high entropy is a property of decent encryption algorithms. net-entropy was set to trigger an alert if it saw any connections whose entropy crossed an arbitrarily chosen threshold of 7.9 (8.0 is the maximum), and protocols that were expected to be encrypted (HTTPS, etc.) were filtered out.

Here’s a summary of what I’ve found so far:

  • Encrypted traffic that crossed the 7.9 threshold included Windows Remote Desktop (RDP), Skype (both calls and signalling traffic), SSH-2, and Google Talk.
  • Unencrypted traffic that crossed the threshold was mainly RTMP (streaming Flash audio/video), and possibly Spotify (I don’t know for sure if Spotify uses encryption or not, but high entropy was observed both on the inbound media from port 4070 and the outbound media on random ports). Media protocols like this are usually highly compressed – high entropy is a side effect of compression as well as encryption.
  • Encrypted traffic that was not detected included SSH-1 (1.5, to be exact). SSH-2 was detected as one would hope, provided that the session was long enough.

Clearly my blunt approach of a single threshold isn’t the most effective one, as we have both false positives and false negatives. But after applying some visualisations to my results, an intriguing possibility presents itself.

net-entropy was installed in this instance on a Sguil box mainly so that it was in a position where it could see a large volume of real-world traffic. A happy side effect of this is that it’s quite simple to extract the raw traffic captures that each net-entropy alert is referring to. If we’ve built net-entropy with the –enable-statistics option, we are then in a position to draw graphs of the entropy of an individual TCP stream:

  • First, use the net-entropy alert to extract the specific TCP stream. The easiest way to do this is to search for it using the Sguil client, and then export the results to Wireshark. Let’s save the capture as session.raw
  • Then we run net-entropy over it in statistics mode:
    $ net-entropy -r session.raw -s mystatsdir -F
     -b -c net-entropy.conf
  • The output of this is a .dat file whose name is made up of a timestamp and the source and dest IP addresses and ports.
  • We can now plot this file in gnuplot:
    plot 'mystatsdir/whateveritwascalled.dat'

By way of a baseline, here is a plot showing the entropy of the first 64KB of an HTTPS and an SSH-2 session. The blue line marks the 7.9 alerting threshold:

net-entropy baseline

Zooming in a little, we can see that HTTPS crossed the threshold after about 2.2KB of data, and SSH-2 took a little longer:

net-entropy zoomLet's zoom in a little on a different area of the graph - the little "wobble" on the SSH-2 line:

net-entropy ssh-2What we're looking at here is the part of the conversation where the various parameters of the SSH-2 session are being exchanged (key exchange protocol, encryption/hashing algorithms, etc). These are passed as cleartext, hence the low entropy at this point.

It's an interesting little pattern, though. Let's overlay some more SSH sessions onto the one above and see what they look like:

net-entropy sshThere are three sessions illustrated here:

  • The blue line is an SSH-2 session, which in the context of this experiment is a "true positive" since it was encrypted and it did cross the 7.9 threshold
  • The red line is another SSH-2 session which was so short in duration it didn't manage to make it above 7.9. This is a "false negative" because we've missed something that definitely was encrypted.
  • The green line is an SSH-1 session. At no point during this session's life did it cross the 7.9 threshold - another false negative.

As far as detecting encrypted traffic goes, this clearly isn't as useful as I'd have hoped. But look at the red and blue lines - look how tightly they follow one another:

net-entropy ssh zoom

This brings us to the intriguing possibility I alluded to earlier - using entropy analysis not for the detection of encrypted traffic, but for the classification of traffic.

What if the entropy of certain types of traffic is reasonably consistent? What if the patterns above represent "fingerprints" for SSH-1 and SSH-2 traffic? If we could match traffic against a library of fingerprints, we'd have a port-independent classifier of sorts.

I've not had time yet to analyse sample sets large enough to be anywhere approaching conclusive, but let's look at some other kinds of traffic:

The following graph shows four RTMP sessions:

net-entropy rtmpWhilst RTMP isn't encrypted, all four sessions have a similar visual fingerprint.

Now let's look at nine RDP sessions (Windows Remote Desktop):

net-entropy rdpThe most obvious outlier here is the black line - this was an RDP session to a server whose encryption level was set to "Low". If we zoom in a bit, we'll see another outlier:

net-entropy rdp zoomThe orange line is significantly different to the others. This particular session sent the string "Cookie: mstshash=machinename" in the first data segment sent from the client to the server - the other sessions had mostly zeroes instead, hence the lower entropy at this point. Since this is the very first data segment in the session, we could possibly infer that we're looking at different client software used to make the connection. Indeed, if we look at RDP sessions from rdesktop (rather than the Windows client), the entropy looks different still:

net-entropy rdp rdesktopThe entropy is low, relative to the Windows client, and there's a slightly different signature at the start of the connection:

net-entropy rdp rdesktop zoomOne might be tempted to think that one could look at graphs like these and infer something about both the server (encryption level in use) and the client (type of software used).

OK. Enough graphs. Summary time.

Detecting encrypted traffic with a straightforward entropy threshold doesn't seem to be useful. However, we may be able to use entropy analysis as a means to classify traffic in a port-independent manner, but I've analysed nowhere near enough traffic to assess whether this could be a viable technique or not (there are bound to be loads of outlying cases that don't fit the profile). And even if it is a viable technique, are the bases already covered by other port-independent classifiers (l7filter, et al)? That said, I'm not the first person to explore various visualisations of encrypted traffic, so someone somewhere considers the broad concept useful.

Comments welcome!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Detecting encrypted traffic with net-entropy, part one

Posted in Crazy Plans, net-entropy, NSM on 17 September, 2009 by Alec Waters

I’ve been pondering the possibility of detecting encrypted traffic crossing a network, and I think I’m getting somewhere (not necessarily closer to the goal, but somewhere nonetheless!). My initial thoughts were to put some kind of frequency analysis to the task, and whilst I was researching this I came across net-entropy.

net-entropy is a clever tool that can learn the expected cumulative packet entropy (“randomness”) for a given protocol, and raise alerts if an observed connection falls out of bounds (there’s a very detailed writeup here). The theory is that if someone is attacking a flaw in some cryptographic software (a SSH server, for example), the observed entropy of the connection will decrease unexpectedly once the attack has been executed and the attacker is delivering shellcode or whatever (figures two and three here illustrate the principle).

net-entropy was designed to focus on its list of pre-learned protocols, each of which is described in a protospec file. Here is the file for SSH:

Port: 22
Direction: both
Cumulative: yes
RangeUnit: bytes
# Range: start    end      min_ent        max_ent
Range:   1        63       0              4.38105154
Range:   64       127      4.22877741     4.64838314
Range:   128      255      4.95194340     5.02499151
Range:   256      511      4.86894369     7.28671360
Range:   512      1023     4.86310673     7.59574795
Range:   1024     1535     4.94409609     7.74570751
Range:   1536     2047     5.77497149     7.81915951
Range:   2048     3071     6.44314718     7.85139179
Range:   3072     4095     7.17234325     7.92034960
Range:   4096     8191     7.46498394     7.96606302
Range:   8192     65536    7.82608652     7.99687433

Each range is defined in terms of start byte and end byte, and minimum and maximum entropy. For example, for the first 63 bytes, the entropy is expected to be between 0 and 4.38105154 - an alert is raised if the entropy at this point is either too high or too low.

We could have a go at detecting encrypted traffic (rather than profiling its properties) with a very simple protospec file. What I'm interested in seeing is anything with an observed entropy that's greater than some defined threshold - this will be my indicator that what we're looking at could possibly be encrypted. So, we could have a protospec file that looks like this:

Port: "whatever"
Direction: both
Cumulative: yes
RangeUnit: bytes
# Range: start    end      min_ent        max_ent
Range:   1        65536    0              7.9

This file will cause net-entropy to raise an alert if the entropy for a connection on port "whatever" exceeds my arbitrarily chosen threshold of 7.9 in the first 64KB of its life; the problem here is that I'd have to write thousands of these files to cover the complete set of all TCP ports. I spoke to net-entropy's author, Julien Olivain, about this and he very kindly implemented me an "all" feature, whereby a single protospec file can be applied to the complete range of TCP ports (updated source code is available here).

Now we can start to experiment! net-entropy will accept the usual variety of capture filter, so we can use this to exclude:

  • The protocols that we expect to be encrypted (SSH, HTTPS, etc.)
  • High volume protocols that are scrutinised by other means (SMTP, HTTP, etc.)
  • Non-TCP protocols (net-entropy only works for TCP at the moment)

So, our net-entropy.conf file looks like this:

Interface: eth1
RuntimeUser: nobody
MemoryLimit: 131072
MaxTrackSize: 65536
PcapFilter: tcp and not port 80 and not port 25 and not port 22 and
            not port 443 and not port 993 and not port 995
ProtoSpec: /usr/local/share/net-entropy/protospec/proto-tcp-all.nes

I installed the software on a Sguil box and fired it up; pretty soon, things like this were popping up in /var/log/messages:

Sep 17 11:15:03 morpheus net-entropy[2689]: RISING ALARM on 212.7x.aaa.bbb:1708 -> 82.4x.aaa.bbb:60970 offset=2406 packets=7 entropy=7.90993547 range=0 (1 65536 0.000000 7.900000)

Woohoo! Data! Now all we have to do is work out if it's useful or not. I'm not one for leaving logs lying idly on the server that generated them so I send the messages to a remote syslog collector, in this case a Cisco CS-MARS. The MARS certainly has its flaws and niggles, but it does let you write custom parsers for devices it doesn't know about. Once the MARS has been educated in the ways of net-entropy, you can use its querying mechanism to start exploring the data.

I've written the required custom parsers, and exported them as a Device Support Package that you can import into your own MARS, if you happen to have one and want to play along (download it here). The net result is that I can do stuff like:

  • Ask about the kinds of messages from net-entropy:

    Event Types

    Event Types

  • See the details of sessions seen:

    Sessions

    Sessions

  • Drill down onto a single session:
    A single session

    A single session

    Note that the MARS has noticed that there are two events talking about the same session (based on the IP addresses and ports), and has been able to correlate them together into a single session.

  • Get the raw messages as raised by net-entropy:

    Raw messages from net-entropy

    Raw messages from net-entropy

So, here's where we're at:

  • net-entropy has been enhanced to support a protospec file that applies to all ports
  • This allows us to do "generic" entropy detection
  • Events from net-entropy are being exported to my MARS, from which I can run queries and reports

Next steps:

  • Work out if the things that net-entropy is alarming on are actually encrypted, or if the reason for their high entropy is something else (effective compression, for example). If I'm not reliably detecting encryption, then I can either tweak my entropy threshold, or give up the whole idea ;)
  • If the technique is really yielding useful results, perhaps write an agent for Sguil so that net-entropy's alerts appear in the client for easy drill-down onto the session transcripts (there's an agent available for modsec, so it could be feasible to try this)
  • In the far future, how about a mod to the Sguil client that lets you right-click and say "Graph session entropy"? This would extract the relevant session from the full-content capture (just like the Wireshark option does at present), run it through net-entropy in statistics mode, and use gnuplot to visualise the result.

This post is most definitely filed under "Crazy Plans". Comments on my insanity are welcome.


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Detecting encrypted traffic with frequency analysis – Update

Posted in Crazy Plans, net-entropy, NSM, Sguil on 2 September, 2009 by Alec Waters

I recently wrote about a plan for detecting encrypted traffic, where I mentioned in the comments that I’d come across a package called net-entropy (very detailed writeup here). I’ve been in touch with Julien Olivain, one of the authors, and he’s kindly given me the sources to experiment with.

And experiment I shall – I’ll post my findings when I’ve got some!


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Detecting encrypted traffic with frequency analysis

Posted in Crazy Plans, net-entropy, NSM, Sguil on 12 August, 2009 by Alec Waters

Let’s start with a little disclaimer:

I am not a cryptanalyst. I am not a mathematician. It is quite possible that I am a complete idiot. You decide.

With that out of the way, let’s begin.

NSM advocates the capture of, amongst other things, full-content data. It is often said that there’s no point in performing full-content capture of encrypted data that you can’t decrypt – why take up disk space with stuff you’ll never be able to read? It’s quite a valid point – one of the networks I look after carries quite a bit of IPSec traffic (tens of gigabytes per day), and I exclude it from my full content capture. I consider it enough, in this instance, to have accurate session information from SANCP or Netflow which is far more economical on disk space.

That said, you can still learn quite a bit from inspecting full-content captures of encrypted data – there is often useful information in the session setup phase that you can read in clear text (e.g., a list of ciphers supported, or SSH version strings, or site certificates, etc.). It still won’t be feasible to decrypt the traffic, but at least you’ll have some clues about its nature.

A while ago, Richard wrote a post called “Is it NSM if…” where he says:

While we’re talking about full content, I suppose I should briefly address the issue of encryption. Yes, encryption is a problem. Shoot, even binary protocols, obscure protocols, and the like make understanding full content difficult and maybe impossible. Yes, intruders use encryption, and those that don’t are fools. The point is that even if you find an encrypted channel when inspecting full content, the fact that it is encrypted has value.

That sounds reasonable to me. If you see some encrypted stuff and you can’t account for it as legitimate (run of the mill HTTPS, expected SSH sessions, etc.) then what you’re looking at is a definite Indicator, worthy of investigation.

So, let’s just ask our capture-wotsits for all the encrypted traffic they’ve got, then, shall we? Hmm. I’m not sure of a good way to do that (if you do, you can stop reading now and please let me know what it is!).

But…

…I’ve got an idea.

Frequency analysis is a useful way to detect the presence of a substitution cipher. You take your ciphertext and draw a nice histogram showing the frequency of all the characters you encounter. Then you can make some assumptions (like the most frequent character was actually an ‘e’ in the plaintext) and proceed from there.

However, the encryption protocols you’re likely to encounter on a network aren’t going to be susceptible to this kind of codebreaking. The ciphertext produced by a decent algorithm will be jolly random in nature, and a frequency analysis will show you a “flat” histogram.

So why am I talking about frequency analysis? Because this post is about detecting encrypted traffic, not decrypting it.

Over at Security Ripcord, there’s a really nifty tool for drawing file histograms. Take a look at the example images – the profile of the histograms is pretty “rough” in nature until you get down to the Truecrypt example – it’s dead flat, because a decent encryption algorithm has produced lots and lots of nice randomness (great terminology, huh? Like I said, I’m not a cryptanalyst or a mathematician!)

So, here’s the Crazy Plan for detecting encypted traffic:

  1. Sample X contiguous bytes of a given session (maybe twice, once for src->dst and once for dst->src). A few kilobytes ought to be enough to get an idea of the level of randomness we’re looking at.
  2. Make your X-byte block start a little way into the session, so that we don’t include any plaintext in the session startup.
  3. Strip off the frame/packet headers (ethernet, IP, TCP, UDP, ESP, whatever) so that you’re only looking at the packet payload.
  4. Perform your frequency analysis of your chunk of payload, and “measure the resultant flatness”.
  5. Your “measure of flatness” equates to the “potential likelihood that this is encrypted”.

Perhaps one could assess the measure of flatness by calculating the standard deviation of the character frequencies? Taking the Truecrypt example, this is going to be pretty close to zero; the TIFF example is going to yield a much higher standard deviation.

Assuming what I’ve babbled on about here is valid, wouldn’t it be great to get this into Sguil? If SANCP or a Snort pre-processor could perform this kind of sampling, you’d be able to execute some SQL like this:

select [columns] from sancp where src_randomness < 1 or dst_randomness < 1

…and you’d have a list of possibly encrypted sessions.

How’s that sound?

This post has been updated here.

Check out InfoSec Institute for IT courses
including computer forensics boot camp training.


Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)dataline.co.uk

Follow

Get every new post delivered to your Inbox.

Join 29 other followers