6.033 Discussion Suggestions

6.033--Computer System Engineering

Suggestions for classroom discussion

Topic: Network Address Translation.

Primary reading:

K. Egevang and P. Francis. The IP Network Address Translator (NAT). RFC 1631. (May 1994; http://www.faqs.org/rfcs/rfc1631.html)

Background readings:

T. Hain. Architectural Implications of NAT. RFC 2993. (November 2000; http://www.faqs.org/rfcs/rfc2993.html)

Things that NAT's break. (Internet flame, perhaps by Keith Moore; undated; http://www.cs.utk.edu/~moore/what-nats-break.html)

P. Srisuresh and M. Holdredge. IP Network Address Translator (NAT) Terminology and Considerations. RFC 2663. (August 1999; http://www.faqs.org/rfcs/rfc2663.html)

M. Holdredge and P. Srisuresh. Protocol Complications with the IP Network Address Translator (NAT). RFC 3027. (January 2001; http://www.faqs.org/rfcs/rfc3027.html)

B. Carpenter. Internet transparency. RFC 2775. (February 2000; http://www.faqs.org/rfcs/rfc2775.html)

V. Fuller et al. Classless Inter-Domain Routing (CIDR). RFC 1519. (September 1993; http://www.faqs.org/rfcs/rfc1519.html

By J. H. Saltzer, March 17, 2000, citations updated March 21, 2001. Text revised March 18, 2004.

Why NAT?
(It is a work-around. The problem it is intended to solve is that the Internet is running out of IP addresses. In principle this is a temporary problem, until IPv6, which uses 128-bit addresses, is widely deployed. In practice, no one knows how long it will be till people figure out how to deploy IPv6. But we digress.)
But IP has 32-bit addresses, which would allow 4 billion network attachment points. There are only 400 million hosts attached to the Internet. Maybe this is will be a problem in a few years--why is it a problem now?
(Two reasons. 1. IP requires that addresses be globally unique. Assigning addresses uniquely is hard to do when there are 400 million things needing an address. The technique that was chosen works well at maintaining uniqueness: hierarchically decentralize address assignment. But to decentralize means giving out blocks of addresses to local authorities, most of which end up with more addresses than they really use because they usually ask for the number they might potentially need rather than the number they actually need.)
So why not take back unused addresses?
(That brings us to reason 2. IP does hierarchical routing based on IP address, so we need to maintain geographical adjacency of adjacent addresses. Once having given a block of 255 addresses to someone in California who used only 5 of them, reassigning the 250 unused addresses to someone in Massachusetts would be hard. It could be done at the cost of an exception in the routing tables. But at the present scale, there would be soon be hundreds, thousands, and eventually millions of exceptions, and routing would break down.)
Name a really big user of NAT. (America On-Line)
How does it help them? (They are a community of 10M+ users. Probably only 2M are ever online at one time.)
But they use DHCP or something equivalent; when you dial up they assign you an IP address for the duration of the dialup. So they don't need one IP address per customer, they just need one per currently active customer. So what good does NAT do them?
(A large fraction of AOL network traffic begins and ends inside AOL. They have internal post offices, web servers, chat lines, web caches, etc. So if they have 2M dialed-in customers, probably only a small fraction (say 300,000 for discussion) actually are using the part of the Internet outside AOL. So rather than asking the IP assignment people for a block of 10M or even 2M addresses, they just ask for a block of 300K.)
I thought they assigned an IP address to me when I dialed up. (Yes, but it is a private IP address, meaningful only inside AOL. That is, the IP address they assign is a name that has a limited scope. This address wouldn't work on the public Internet.)
[At this point is it helpful to draw two network clouds on the blackboard. One is AOL, the other the public Internet. They are attached by a NAT box.]
How does the NAT box work? Suppose that the AOL customer who was assigned IP address 10.0.1.1 wants to send a web request to an MIT server at 18.26.0.99?
(The customer sends an IP packet with a source field of 10.0.1.1 and a destination field of 18.26.0.9.)
How does this packet get routed to the NAT gateway? (Inside the AOL network, the NAT box advertises that it knows how to get to all public Internet addresses. So packets addressed to 18.x.x.x get routed to the NAT box.)
What does the NAT box do?
This is the easy part. It looks in an assignment table to see if 10.0.1.1 has a currently-assigned public IP address. If not, it assigns one from its block of 300,000 public IP addresses. It then rewrites the from-field of the IP address in the packet to contain the public IP address that is assigned to 10.0.1.1. Then it sends the packet out onto the public part of the Internet, where it finds its way to the server.)
So the server thinks the packet came from 14.15.16.191. When the server sends a response to that address, how does that response get back to the NAT box? (Presumably the NAT box has been advertising that it knows how to route packets for the whole block of addresses assigned to AOL.)
What does the NAT box do with the response packet? (It looks in its assignment table to see if there is a private address assigned to this public address. If it finds one, it rewrites the IP destination field of the response packet to contain the private address, and sends the packet on to the private AOL network. The packet ends up at the originally requesting site, and all is well. Neither of the participants realizes that anything unusual happened--we hope.)
Where do things start to go wrong? (When the payload data of the packet also contains the source or destination IP addresses.)
Why would a higher layer protocol want to put the IP address inside its header or data? (Here is where the religious battles start. If you take the present design of Internet addressing and layering as given, there are lots of reasons. If you could redesign it from the beginning some, perhaps all, of these reasons might go away.)
Give a reason.
(The transport layer protocol wants to provide delivery accuracy. So it includes a copy of the IP source and destination addresses in the payload data, so that the transport layer at the other side can check to make sure that this packet isn't a misdelivered packet that was intended for to someone else.)
What goes wrong? (The packet arrives at the recipient, where the transport layer compares the transport layer copy of the to-address with its own IP address, and the transport layer copy of the from-address with the source address supplied by the network layer. Unless the NAT box has been very clever, one of these comparisons will probably fail.)
How can NAT cope with this problem? (With difficulty. Its only choice is to peer into the transport-layer packet and begin tinkering with the payload data. To do this, it has to know about the transport layer protocol. So NAT either is violating the layer abstraction or it is providing an end-layer implementation in the middle of the network, depending on your preferred way of explaining it.)
This doesn't seem so bad. If it knows the format of the transport layer header, why can't it just change the IP addresses there? (The IP addresses are just the beginning. The transport layer protocol probably has a checksum that includes those addresses, so NAT needs to revise the checksum. More generally, the transport layer can place an IP address anywhere it likes, not just in a header. For a slightly weirder example, TCP does not include the from- and to- addresses in the header, but it does include them in the TCP layer checksum. You may have noticed the word "pseudo header" in the RFC. TCP calculates its checksum as if those two address fields were in the header, even though they aren't. NAT has to know about that, know what checksum algorithm TCP is using, and revise the checksum accordingly.)
Consider File Transfer Protocol (FTP). It opens a control connection to the first target to request the transfer and then to provide metadata about the transfer. One critical piece of metadata is the second target's IP address (in the most common use of FTP the originator is the second target), so that the first target can open a data connection to the second target and upload or download the file.
To modularize things, to resuse existing code, and to allow FTP's to be initiated with a widely available application, FTP uses Telnet as the transport protocol on the control connection. So NAT has to look inside the Telnet layer and track down and (maybe) modify the IP address. So far, so good. But what makes this example exceptionally awkward for NAT? (Telnet is a terminal-to-computer protocol, used for humans to talk to their applications, so its data stream consists entirely of ASCII characters. Suppose the IP address is 10.0.0.1 and it has to be rewritten to 175.197.143.102. That adds 7 bytes to the length of the packet, which means that
- NAT must also adjust the TCP length field.
- The larger packet may exceed the current window size.
- The larger packet may exceed the link layer MTU, so NAT may have to divide the data into two segments.
- The number of bytes in the next TCP acknowledgement won't be what the sender expected to receive.
This is getting really messy.)
What is the real problem here? (Some people may claim that FTP is an example of bad protocol design. But whether it is good or bad isn't the right issue. Since address translation can break higher-layer protocols, the NAT box needs to know the details of every higher-layer protocol that might flow through it, so that it can patch things up. If you develop a new protocol that the NAT box doesn't know about, you may not be able to use it to communicate with systems on the other side of the NAT. Put another way, the NAT box lacks end-to-end transparency. Technically, it is acting as an Application Level Gateway (ALG) and the proper model is that it is concatenating two end-to-end protocols by terminating each of them and patching them together. Sometimes this will work, sometimes it won't; in any case it requires that the NAT box have some understanding of the end-to-end protocols.)
Is it a problem in practice? (Yes. Any time you hear of something about the Internet that doesn't work right on AOL, it is usually because of NAT.)
Can't you design new application protocols so that they always work through a NAT without requiring that the NAT patch things up? (Maybe. But you will probably find that it restricts the things you can do. For example, you can't easily provide delivery accuracy. You could redesign FTP to send the DNS name of the second target, instead of its IP address. But there is no rule saying that every network attachment point must have a DNS name, so this protocol change won't always work.)
Shouldn't the network provide delivery accuracy anyway? (The end-to-end argument says no, because not all applications need delivery accuracy, and some applications may have a different concept of delivery accuracy from the one that the network designer provided. So the real reason for objecting to designing an application protocol that works well with NAT is that it encourages placing end-layer functions inside the network layer, which the end-to-end argument teaches us we should avoid.)
Are there any other problems? (Fate sharing. If the single NAT box goes down, all connections in and out of AOL are disrupted.)
Why is this any worse than a router going down? (Because the NAT box is holding state that the end-points of the connection depend on, and which it can't reconstruct. A router can alway start over with an empty memory and learn what it needs to know (routing tables) from its neighbors.)
So can't we replicate the NAT box? (Yes, but the replicas must be complete and always up to date. A packet may go out through one NAT box, but the response to that packet may be routed back through a different one, just a few milliseconds later. And any method of keeping them that tightly coupled runs the risk that they will all crash (or all muddle their state the same way) simultaneously.)
There is something fishy here. What if AOL assigns the IP address 18.26.0.99 to another AOL customer? If the customer at 10.0.1.1 tries to send a web request to the MIT server that is using that address on the public internet, the request will be delivered to the other AOL customer's attachment point.
(Right. The AOL private address assignments must not overlap with the public internet's address assignments.)
Then what good did it do to have a private address space? (None, if there is only one such private address space. But everyone using a NAT can reuse the same private address space. The Internet address assigners have set aside some blocks of addresses for private use, such as 10.*.*.*, and addresses in that block can be reused by every NAT user in the world without causing any confusion or routing problems, except under certain special conditions.)
Such as?... (Consider two private networks, each of which have allocated the address 10.0.1.2. If we link these two networks with a tunnel through the public Internet, things are going to start getting confused. So any private networks that might someday need to be connected together should coordinate their address allocations, even within their private address spaces. If you are Fleet Bank and your stockholders just approved a merger with Bank of America, and each used a private address space, there will be a major problem in interconnecting them. Of course you could continue to run NAT forever between the two private parts of the network, but now you are missing some of the benefits of the merger.)
How about a modest use of NAT? (A customer of Comcast cable modem service has three computers. NAT can be used to share one IP address among a small number of hosts on a local network.)
This can't work. How does the NAT box know which local machine to route incoming packets to? (The NAT box can use the port numbers of the transport protocol as part of the mapping. That is, packets coming in destined to port 9951 are sent to 10.0.0.1, but for port 9996 go to 10.0.0.2. Once again, the NAT box has to know a lot about how the higher-level protocol works, so that it can assign and rewrite port number fields on packets as they go in and out.)
Comments and suggestions: Saltzer@mit.edu