Making sense of the results of the net-entropy experiment

The rest of the net-entropy saga is listed here, but to briefly recap, net-entropy is a tool that can be used to profile the level of “randomness” of network traffic. High entropy traffic like this is usually either encrypted or compressed in nature, and I’m definitely interested in knowing about the presence of the former on my network. I’ve been using net-entropy as a general purpose “randomness detector”, something that the author didn’t really have in mind when he wrote it.

However, drawing meaningful conclusions from the results gathered can be tricky (this is nothing to do with net-entropy itself, and everything to do with the way I’m using it backwards to generically detect randomness). Some of the observed high entropy traffic will definitely fall into the “encrypted” category, like SSH and Remote Desktop. Other stuff will definitely be of the “compressed” flavour, like RTMP on port 1935.

Once this kind of low-hanging fruit has been pruned, the analyst is left with a whole load of unknown high-entropy traffic. If net-entropy’s presence is to be of any value at all when deployed this way, we have to try to make sense of the residual data in some way or another.

One tactic is to fire up Sguil and use its full-content capture to extract and examine each of the unknown high-entropy streams in turn. This is massively labour intensive, but some of the time you’ll find something interesting like HTTPS on a non-standard port (the certificate exchange at the start of the conversation is in clear text, giving you a clue). Most of the time however, you’re left looking at unintelligible garbage. Unsurprising really, given that it’s likely to be either compressed or encrypted…

Given that protocols like SSH, RDP and RTMP can most usually be identified by their port numbers alone and what’s left is unreadable junk, how are we to derive value from these other indicators from net-entropy? I can think of a couple of ways:

  • Session contextualisation
  • Session profiling on the basis of session duration and frequency, etc

Putting a high-entropy session into context isn’t too labour intensive, and sometimes pays dividends. Let’s say net-entropy has spotted a session from to; the full-content capture is unreadable garbage, and the port numbers offer no clues. If we ask the question “was there any other traffic involving and at the same time”, we might get a hint. In this instance, there was a connection from to, which looks like an FTP command session judging by the port number. When we examine the full-content capture for this session, we can see passive FTP at work:

227 Entering Passive Mode (4,3,2,1,13,49).

The last two numbers denote the port that the client should connect to to transfer a file. Doing the maths, we see that (13 * 256) + 49 = 3377, so we can be pretty confident that our high-entropy traffic in this case was a file transferred over FTP.

If context can’t be established however, all is not lost – we can look at other attributes of the traffic.

A lot of the high-entropy traffic that we see is bound for random ports on IP addresses all over the world, and most of it is related to peer-to-peer apps. In the case of Skype, high-entropy TCP traffic is usually signaling traffic to SuperNodes. Traffic to a given SuperNode  will occur little-and-often for a long period of time until one of the two endpoints goes offline, so net-entropy will be sending you alerts all day long for that specific destination. However, you certainly can’t say for sure that traffic matching this profile is definitely Skype (it could be a keylogger phoning home at regular intervals, for example). As such, examination of the little-and-often class of high-entropy flow doesn’t usually yield any definitive conclusion.

What is definitely interesting is where you have many high-entropy flows to the same destination address and port in a short period of time. We have detected the following by taking this “clustering” approach:

  • HTTP on a non-standard port, serving up images (most image formats are compressed, and thereby have a high-entropy signature). As an example, some of the images in search results here come from port 7777. Someone browsing this site will trigger many indicators from net-entropy in a short space of time that refer to this port.
  • HTTP proxies. Again, the high-entropy traffic is most commonly as a result of images being transferred.
  • SSL/TLSv1 on port 8083, which turned out to be traffic to a SecureSphere web application firewall.

Clusters like this are most easily detected by eye by means of visualisation. The following image is from one of my Cisco MARS reports, and shows a cluster of traffic to port 8083 in orange:

Something “worth looking at” will usually be quite clear from an image like this.

The approach I’ve taken with net-entropy has yielded neither precise nor definitive results (which certainly isn’t a complaint about net-entropy itself – it was never designed to be used the way I’m using it). But, I’ve discovered things that I’d never have known about without it, so I reckon I’ve come out on top!

Alec Waters is responsible for all things security at Dataline Software, and can be emailed at alec.waters(at)


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: