Barbara van Schewick posted a really thoughtful analysis about how about application-specific vs. application-agnostic discrimination directly affects innovation, and looks at an actual example of a Silicon Valley startup. I think her points are right on, and I strongly support the rationale for resisting “application-specific” discrimination.

In fact, Barbara’s point is the key to the whole debate. The future of the internet requires that applications be able to be invented by anyone and made available to everyone, and information shared on the net by anyone to be accessible to anyone. That property is “under fire” today by Internet access providers, by nation states, and by others who wish to limit the Internet’s reach and capabilities. I wholeheartedly support her points and her proposal.

I think it’s important to remind us of a further point that is quite obvious to those who have participated in the Internet on a technical level, from its original design to the present, so I thought I’d write a bit more, focusing on the fact that the Internet was designed in a way that makes application-specific discrimination difficult. Barbara knows this, since her work has been derived from that observation, but policymakers not steeped in the Internet’s design may not. As we pursue the process of making such rules, it is important to remind all of us that such rules are synergistic with the Internet’s own code, reinforcing the fundamental strengths of the Internet.

So I ask the question here: what do we need from the “law” when the “code” was designed to do most of the job? Well, the issue here is about the evolution of the Internet’s “code” – the implementation of the original architecture. The Internet’s code will continue to make it hard for application discrimination to be difficult as long as a key aspect of its original design is preserved – that the transport portion of the network need not know what the meaning of the bits being transported on any link is. We struggled to make all our design decisions so that would remain true. Barbara has made the case that this design choice is probably the most direct contribution to the success of the Internet as a platform for innovation.

My experience with both startups and large companies deciding to invest in building on general purpose platforms reinforces her point. Open platforms really stimulate innovation when it is clear that there is no risk of the platform being used as a point where the platform vendor can create uncertainty that affects a product’s ability to reach the market. This is especially true for network communications platforms, but was also true for operating systems platforms like Microsoft DOS and Windows, and hardware platforms like the Apple II and Macintosh in their early days. In their later days, there is a tendency for the entities that control the platform’s evolution to begin to compete with the innovators who have succeeded on the platform, and also to try to limit the opportunities of new entrants to the platform.

What makes the Internet different from an operating system, however, is that the Internet is not itself a product – it is a set of agreements and protocols that create a universal “utility” that runs on top of the broadest possible set of communications transport technologies, uniting them into a global framework that provides the ability for any application, product or service that involves communications to reach anyone on the globe who can gain access to a local part of the Internet.

The Internet is not owned by anyone (though the ISOC and ICANN and others play important governance roles). It’s growth is participatory – anyone can extend it and get the benefits in exchange for contributing to extending it. So controlling the Internet as a whole is incredibly hard. However certain areas of the Internet can control the Internet in limited ways. In particular, given that local authorities tend to restrict the right to deploy fiber, and countries tend to restrict the right to transmit or receive radio signals, the first or last mile of the Internet is often a de facto monopoly, controlled by a small number of large companies. Those companies have incentives and the ability to do certain kinds of discrimination.

However, a key part of the Internet’s design, worth repeating over and over, is that the role of the network is solely to deliver bits from one user of the Internet to another. The meaning of those bits never, under any circumstances, needs to be known to anyone other than the source or the destination for the bits to be delivered properly. In fact, it is part of the specification of the Internet that the source’s bits are to be delivered to the destination unmodified and with “best efforts” (a technical term that doesn’t matter for this post).

In the early days of the Internet design, my officemate at MIT, Steven T. Kent, who is now far more well known as one of the premier experts on secure and trustworthy systems, described how the Internet design could in fact, be designed so that all of the bits delivered from source to destination were encrypted with keys unknown to the intermediate nodes, and we jointly proposed that this be strongly promoted for all users of the Internet. While this proposal was not accepted, because encryption was thought to be too expensive to require for every use, the protocol design of TCP and all other standard protocols have carefully preserved the distinctions needed so that end-to-end encryption can be used. That forces the design to not depend in any way on the content, since encryption means that no one other than the source or destination can possibly understand the meaning of the bits, so the network must be able to do perfectly correct job without knowing same.

Similarly, while recommendations were made for standard “port numbers” to be associated with some common applications, the designers of the Internet recognized that they should not assign any semantic meaning to those port numbers that the network would require to do its job of delivering packets. Instead, we created a notion of labeling packets in their header for various options and handling types, including any prioritization that might be needed to do the job. This separation of functions in the design meant that the information needed for network delivery was always separate from application information.

Why did we do this? We did not do it to prevent some future operator from controlling the network, but for a far more important reason – we were certain that we could not predict what applications would be invented. So it was prudent to make the network layer be able to run any kind of application, without having to change the network to provide the facilities needed (including prioritization, which would be specified by the application code running at the endpoints controlled by the users).

So here’s a concern with Barbara’s latest post, and in fact with much of the policy debate at the FCC and so forth. The concern is that the Internet’s design requires that the network be application agnostic as a matter of “code”. More importantly, because applications don’t have to tell the network of their existence, the network can’t be application specific if it follows the Internet standards.

So why are we talking about this question at all, in the context of rules about the Open Internet at FCC? Well, it turns out that there are technologies out there that try to guess what applications generated particular packets, usually by relatively expensive add-on hardware that inspects every packet flowing through the network. Generically called “deep packet inspection” technologies and “smart firewall” technologies, they look at various properties of the packets between a source and destination, including the user data contents and port numbers, and make an inference about what the packet means. Statistically, given current popular applications, they can be pretty good at this. But they would be completely stymied by a new application they have never seen before, and also by encrypted data.

What’s most interesting about these technologies is that they are inherently unreliable, given the open design of the Internet, but they can be attractive for someone who wants to limit applications to a small known set, anyway. An access network that wants to charge extra for certain applications might be quite happy to block or to exclude any applications that generate packets its deep packet inspection technologies or smart firewall technologies cannot understand.

The support for such an idea is growing – allowing only very narrow sets of traffic through, and blocking everything else, including, by necessity, any novel or innovative applications. The gear to scan and block packets is now cheap enough, and the returns for charging application innovators for access to customers is thought to be incredibly high by many of those operators, who want a “piece of the pie”.

So here’s the thing: on the Internet, merely requiring those who offer Internet service to implement the Internet design as it was intended – without trying to assign meaning to the data content of the packets – would automatically be application agnostic.

In particular: We don’t need a complex rule defining “applications” in order to implement an application agnostic Internet. We have the basis of that rule – it’s in the “code” of the Internet. What we need from the “law” is merely a rule that says a network operator is not supposed to make routing decisions, packet delivery decisions, etc. based on contents of the packet. Only the source and destination addresses and the labels on the packet put there to tell the network about special handling, priority, etc. need to be understood by the network transport, and that is how things should stay, if we believe that Barbara is correct that only application-agnostic discrimination makes sense.

In other words, the rule would simply embody a statement of the “hourglass model” – that IP datagrams consist of an envelope that contains the information needed by the transport layer to deliver the packets, and that the contents of that envelope – the data itself, are private and to be delivered unchanged and unread to the destination. The other part of the hourglass model is that port numbers do not affect delivery – they merely tell the recipient which process is to receive the datagram, and have no other inherent meaning to the transport.

Such a rule would reinforce the actual Internet “code” because that original design is under attack by access providers who claim that discrimination against applications is important. A major claim that has been made is that “network management” and “congestion control” require application specific controls. That claim is false, but justified by complex hand-waving references to piracy and application-specific “hogging”. Upon examination, there is nothing specific about the applications that hog or the technologies used by pirates. Implementing policies that eliminate hogging or detect piracy don’t require changes to the transport layer of the Internet.

There has been a long tradition in the IETF of preserving the application-agnostic nature of the Internet transport layer. It is often invoked by the shorthand phrase “violating the end-to-end argument”. That phrase was meaningful in the “old days”, but to some extent the reasons why it was important have been lost to the younger members of the IETF community, many of whom were not even born when the Internet was designed. They need reminding, too – there is a temptation to throw application-specific “features” into the network transport by vendors of equipment, by representatives of network operators wanting to gain a handle to control competition against non-Internet providers, etc. as well as a constant tension driven by smart engineers wanting to make the Internet faster, better, and cheaper by questioning every aspect of the design. (this design tradition pushed designers to implement functions outside the network transport layer whenever possible, and to put only the most general and simple elements into the network to achieve the necessary goal, so for example, network congestion control is managed by having the routers merely detect and signal the existence of congestion back to the edges of the network, where the sources can decide to re-route traffic and the traffic engineers can decide to modify the network’s hardware connectivity. This decision means that the only function needed in the network transport itself is application-agnostic – congestion detection and signalling).

So I really value Barbara’s contribution, reminding us of two things:

  • Application specific discrimination harms everyone who uses the Internet, because it destroys the generativity of the Internet, and
  • The Internet’s design needs a little help these days, from the law, to reinforce what the original code was designed to do

The law needs to be worked out in synergy with the fundamental design notion of the Internet, and I believe it can be a small bit of law at this point in time, because the Internet is supposed to be that way, by design. If the Internet heads in a bad direction through new designs, perhaps we might want to seek more protections for this very important generativity is important to the world.

Note: My personal view is that the reason that this has become such an issue is that policymakers are trying to generalize from the Internet to a broad “network neutrality” for networks that are not the Internet, that don’t have the key design elements that the Internet has had from the start. For example, the telephone network’s design did not originally separate content from control – it used “in band signalling” to control routing of phone calls. To implement “neutrality” in the phone network would require actually struggling with the fact that certain sounds (2600 Hz, e.g.) were required to be observed by the network – in the user’s space. (this also led to security problems, but it was done to make “interconnect” between phone switches easy).

(2) Comments   


Andrea M on 17 December, 2010 at 16:10 #

Hi David – The last piece of your post is particularly interesting to me. Could you expand a little on the security points? (I’m also sympathetic to pieces of Barbara’s argument, btw.) Imho, increasingly the policy discussion is going to turn to intermediaries arguing the need to provide “security services” filtering which packets are conveyed – e.g. filtering for spam, malware etc. This will also get blended with a contract law discussion: “if the users want us to protect them and consent in their EULAs, and we need to protect our networks anyway, it’s win-win.” Even assuming that analysis has merit, there’s a big question as to which packets are identified as the alleged security risk — sometimes it might be true, and sometimes it may be useful spin. At least some of the current policy discussion doesn’t seem to grasp this nuance and seems to be ok with a blanket “security” exemption from nondiscrimination rules. Thoughts? (Happy 2011, and I hope to run into you again soon.)

dpr on 18 December, 2010 at 16:46 #

This question deserves a longer answer, so I’ll probably write a more extensive blog entry in the next few days. However, I think I can briefly address it this way. If only the endpoints can reliably understand the meaning of the bits being sent in any individual packet or flow (because the network is unaware of applications and standards), it is still quite straightforward to design systems that allow end users to delegate to third parties various security mechanisms.

For example, consider email that arrives at my machine. The email itself arrives in a set of packets that look no different from any other data packets – to the transport network. In many cases, the packets are completely encrypted. There are many protocols that deliver the email to my machine (SMTP, POP3, Microsoft Exchange, various Ajax-based webmail interfaces, and even Lotus Notes replication protocol. The emails in all of these packets are not distinguishable reliably by any particular contents in the packet, there may be novel email protocols invented tomorrow that work in weird ways (peer-to-peer distribution), etc.

Thus, the email “bits” and the “attached file” bits that are inside those emails may in fact be dangerous in some way, but the transport network cannot know that, because it doesn’t know what the bits mean. 0′s and 1′s (or even strings of bytes that appear to be English-language text) cannot be presumed to be “emails” and what the recipient will do with them is not knowable to anyone but the sender or receiver.

THat said, I as a user might want files attached to be checked for virus and malware contents. How can I do that? Well, one simple way is to put the checking on the recipient’s computer, and check after the email is received, but before it is handed off to be displayed to the user or “run” or used as input to Microsoft Word. That’s what programs list Norton Internet Security or McAfee Internet Security do – they process text *after* it is known to be email because the recipient computer program that fetches email hands it off to these programs and waits for their assessment. They don’t even look at non-email data received over the network.

Of course, in the case of small, low powered machines, and so forth, there might be some reason why the user would want the screning of her email to occur somewhere other than on their personal machine. One can, in fact, delegate such processing to any third party – merely by using the flexibility of the network itself. The way this is done is to delegate the fetching of one’s email to a service out on the Internet somewhere that is told where and how to fetch email – in particular the service is told what protocol is to be used, and given instructions as to how the email will be structured when it arrives over that protocol. (I personally have two such layers of protection on my email, filtering my email at two points that I personally delegate such checking to – one of these is non-standard, because I run Linux and not Windows, so I worry less about viruses and malware tuned to Windows, but the other is quite standard).

This delegation of *email* scanning is application specific, but that arises from the fact that I as a user *control* what processes my email. My ISP cannot reliably tell when I’m getting email – as I pointed out before, the packets it sees are not labeled with the application I use, or with the conventions I have agreed to use with my email sources.

So this is a huge distinction, and I emphasized it in the post – the network transport does not need to know what application or protocol or meaning is being transported to do its job.

Now there is the possibility that a last-mile ISP (Comcast, FiOS, ATT) might want to offer email screening as a service. It might even offer user “mailboxes”. If the user chooses to use the ISP provided mailboxes, rather than receiving the mail via some other service, or to redirect email through an email-scanning proxy that is provided by that last–mile ISP, that’s fine. It may even be more cost-effective and simple for users that way.

However, the idea that an ISP can provide security for *all applications* and all end-user technologies reliably by scanning packets is just incorrect. It would be like the post office claiming it could protect the recipients of email from frauds and scams by opening all the mail that we each receive and scanning the mail with high-speed scanners. I daresay that many, many companies that are actually Ponzi schemes or fraud operators use the postal mail system in a way that cannot be distinguished from normal commercial mail. Why would we think that the post office should be the party that “protects” us from application-level threats like bogus charities?

Further, it’s quite easy to use the post office to send copies of a company’s most secret information to recipients outside the company. Would a French company operating in the US want the US Postal Service scanning the contents of its letters to business partners, suppliers, and customers? Analogously, why is an ISP scanning packet contents supposed to be the appropriate place to implement “security” against disclosure of secrets?

The idea that ISPs need access to packet contents in order to make its customers “secure” is an idea that falls apart on the slightest inspection.

Yet, there are ISPs who really want their customers to feel that they are “protecting” them from bad events. I think the best protection would be to help their customers think clearly about these problems, rather than make absurd claims that DPI can implement “security”.

Post a Comment