A Tale of Two Routers
Take a look at the diagram below, showing two (Cisco) routers. HugeCorpCoreRouter is a mighty behemoth with a six figure price tag. It has redundant route processors, handles many gigabits per second of business-critical traffic, has all sorts of esoteric connections and requires a squad of elite ninja black-ops CCIEs to keep it all running.
TinySOHORouter, by comparison, is a trivial speck on the corporate network diagram. It has a single ADSL connection and performs the usual SOHO tasks of NAT, firewall, DSL dialup, etc. Both routers export Netflow data to a central collector.
As you ponder my da Vinci-like Visio skills, consider the following question. Which router will pose the greater Netflow analysis challenge to the security team?
You’ve probably guessed it by now – the troublesome router is TinySOHORouter. HugeCorpCoreRouter, whilst powerful and complex, has a relatively easy job when it comes to Netflow. TinySOHORouter however has three sticking points that could prove to be troublesome for a Netflow analyst. None of the following features are typically running on your average big beefy HugeCorpCoreRouter:
- The firewall process (or any kind of filtering ACL). HugeCorpCoreRouter is concerned with forwarding datagrams as fast as possible through the core – firewall operarions do not live here
- The NAT process
- The dialer interface associated with the ADSL connection
Let’s look at each of these in turn.
The firewall process
Netflow is, by default, an ingress-based technology, which means that the router’s flow cache is updated when datagrams are received by an interface. However, a datagram doesn’t have to enter and leave the router to leave an impression in the flow cache. This manifests itself in an interesting way when a firewall is sticking its oar in.
The Netflow v5 flow record format has fields that describe the SNMP interface indexes of the input and output interfaces for any given flow. This is useful, because it means that your Netflow analysis tools can tell you that when 10.11.12.13 spoke to the webserver on 192.168.0.1, the traffic from 10.11.12.13 entered the router on FastEthernet4/23 and left it on GigabitEthernet0/2. This also makes it possible to draw pretty per-interface graphs of Netflow traffic. (BTW, you’ll want to use the “snmp-server ifindex persist” command otherwise the SNMP interface indexes could change when the router reloads, which can really confuse analysis!)
But what if there were an ACL in place that drops all traffic to port 80 on 192.168.0.1? Dropped datagrams are one of the byproducts of any kind of firewall or ACL – how does Netflow handle those?
Let’s say a datagram from 10.11.12.13 is received, destined for 192.168.0.1:80. As this destination is denied by an ACL, the router duly drops it. Netflow, being an ingress technology, will still put an entry into the flow cache to describe the flow, despite the fact that the datagram was dropped by an ACL (even if the ACL is applied in the inbound direction on the receiving interface). There is no output interface for the flow in this case, so what does the router put into the flow record to denote this?
Flows that are either a) dropped by the router or b) destined for the router itself (SSH sessions, for example) will have zero in the output interface field, to show that the flow entered the router but did not leave.
So why is this a problem for the analyst?
Let’s say I run a report that shows all destination ports for destination IP address 192.168.0.1 (in a naive attempt to find out “what services have people been using on my server?”). Much to my surprise, port 80 features prominently. Why’s it in the report? Isn’t it blocked by an ACL? Have we been hacked? Has the APT Bogeyman paid us a visit?
Fortunately, we’re safe. Port 80 features because 10.11.12.13 tried to talk to it, causing a flow to be logged despite the fact that the ACL dropped the traffic. If you were to re-run the report asking for the number of bytes transferred between 10.11.12.13 and 192.168.0.1:80, we’d see 40 bytes in the client->server direction (the size of an IP datagram with a TCP SYN in it) and zero bytes in the server->client direction, which describes the ACL drop nicely.
Keep this in mind when designing reports based on Netflow data. Certain products like Netflow Analyser are able to take this behaviour into account to a certain degree (“Suppress Access Control List related drops”). Alternatively, you could use the Netflow v9 flow record format if your router and analysis tools support it. There is a useful field called “FORWARDING STATUS” which tells you if a flow was forwarded, dropped or consumed, allowing the analyst to differentiate between traffic dropped by the router and traffic destined for the router. Very handy.
The NAT process
Our second bugbear can also cause problems, especially if we want to ask questions like “show me all the traffic destined for the single PC behind TinySOHORouter” – the report in this case will be totally blank, even if the PC has been hitting Facebook all day long. But why?
Take the simple case of an HTTP flow between our single PC at 10.11.12.13 (a private IP address on a router’s FastEthernet0 interface) and 126.96.36.199 (a public webserver on the Internet via FastEthernet1). On its way out of the router, the private 10.11.12.13 gets NATted into 188.8.131.52, the IP address of FastEthernet1.
From Netflow’s point of view, it goes like this:
- A TCP segment from 10.11.12.13 destined for 184.108.40.206 is received on Fa0. An entry in the Netflow cache accounts for this.
- The router decides that the traffic should be sent out via Fa1, and does a source IP address NAT translation from 10.11.12.13 to 220.127.116.11 before it sends it on its way.
- The TCP response is eventually received on Fa1 from 18.104.22.168 destined for 22.214.171.124, which is 10.11.12.13’s “outside” address. An entry in the Netflow cache accounts for this.
- The NAT translation from 126.96.36.199 to 10.11.12.13 takes place, and the TCP response is sent out of Fa0.
Therefore, all of the returning traffic will be shown as destined for 188.8.131.52 and never 10.11.12.13 – this is because input accounting (including Netflow) occurs on the router before the NAT outside-to-inside translation takes place:
There are three ways to either get around or assist with this problem:
- If your router and Netflow collector support it, disable ingress Netflow accounting on Fa1 and enable both ingress and egress Netflow accounting on Fa0 (the inside interface). This means that all flows will be accounted for on the “inside” of the NAT process. Take care, though – by doing this we are causing Netflow to “ignore” all traffic that does not cross Fa0. This may or may not be a problem, depending on your topology and requirements. Also, think very carefully about this approach if your router has many layer 3 interfaces. If ingress and egress Netflow were to be enabled on both Fa0 and Fa1, there’s a chance your Netflow collector could see duplicated flows.
- If your router and Netflow collector support it, you can use the “ip nat log translations flow-export” command. This will log all NAT translations in a flow template that looks like this:
templateId=259: id=259, fields=11 field id=8 (ipv4 source address), offset=0, len=4 field id=225 (natInsideGlobalAddress), offset=4, len=4 field id=12 (ipv4 destination address), offset=8, len=4 field id=226 (natOutsideGlobalAddress), offset=12, len=4 field id=7 (transport source-port), offset=16, len=2 field id=227 (postNAPTSourceTransportPort), offset=18, len=2 field id=11 (transport destination-port), offset=20, len=2 field id=228 (postNAPTDestinationTransportPort), offset=22, len=2 field id=234 (ingressVRFID), offset=24, len=4 field id=4 (ip protocol), offset=28, len=1 field id=230 (natEvent), offset=29, len=1
This will give you a log of all NAT translations that you can use to find out the actual destination for the traffic from 184.108.40.206 to 220.127.116.11. Your Netflow collector may even be smart enough to correlate this information onto other “standard” flow exports, which would be a very neat trick indeed.
- If your router supports it, you can use the “ip nat log translations syslog” command. This will dump all NAT translations to syslog like this:
Sep 14 12:31:39.740 BST: %IPNAT-6-CREATED: tcp 192.168.0.88:4021 18.104.22.168:4021 22.214.171.124:443 126.96.36.199:443 Sep 14 12:32:53.733 BST: %IPNAT-6-DELETED: tcp 192.168.0.88:4021 188.8.131.52:4021 184.108.40.206:443 220.127.116.11:443
Take care, though – this approach has the possibility to add significant load to your router, your syslog server, and your syslog analysis mechanisms – it becomes a manual task to correlate the NAT translations from syslog to the Netflow exports from your router.
The ADSL link’s dialer interface
interface ATM 0/0/0
The physical ADSL interface
interface dialer 0
The dialer interface created by the user in order to connect to the DSL provider
interface virtual-access XX
A virtual interface created by the router, cloned from and bound to interface dialer0
Of these, only the dialer and virtual-access interfaces are layer 3 interfaces that can participate in Netflow, and of these the user only has direct control over the configuration of the dialer interface. So we just enable Netflow on TinySOHORouter’s dialer0 and inside ethernet interfaces and we’re done, right?
If you were to use your Netflow analysis tools to look at an interface graph for dialer0, all you will see is outbound traffic. You’ll also notice that the virtual-access interface has popped up as well, showing only inbound traffic. No one interface has the complete picture.
This is, interestingly enough, the expected behaviour. Traffic from the ethernet network leaves the router via dialer0 because that’s what the default route says to do (“ip route 0.0.0.0 0.0.0.0 dialer0″). Therefore, when the ethernet interface receives a datagram destined for the Internet, Netflow will put the SNMP interface index of dialer0 into the flow cache. However, the router doesn’t actually use dialer0 to send or receive traffic, it uses the virtual-access interface cloned from it. This means that when datagrams are received from the Internet, they enter the router on virtual-accessXX instead of dialer0 or any of the other associated interfaces. This is why the dialer shows only outbound traffic and the virtual-access shows only inbound. All very logical and intuitive, I’m sure you’ll agree…
How to get around this? Either just “keep in it mind” when performing analysis, or hope that your Netflow analysis tools have some way to cater for it by plotting the outbound traffic on dialer0 and the inbound traffic on virtual-accessXX on the same graph.
Those are all the Netflow analysis “gotchas” that spring to mind – can anyone think of any others?
Alec Waters is responsible for all things security at Dataline Software, and can be emailed at firstname.lastname@example.org