A research group in Europe has proposed to overhaul the way the Internet looks for data. The group was funded as part of a project called ‘Pursuit’ and their ideas are described in this Pursuit Fact Sheet.
The proposal involves changing the way that the Internet searches for data. Today searches are done by URL, or Universal Resource Locator. What URL searches do is to identify the original server that holds the desired data. When you do a Google search that is what you find – the address of the original server. The problem with looking for data this way is that everybody looking for that same data is going to be sent to that same server.
There are several problems that are associated with searches based upon looking for the original server that holds a piece of data. It means that everybody looking for that data is sent to the same server. If enough people look for that data at the same time the original server might crash. The original server can also be effectively shut down by denial of service attacks. And sending everybody to the original server is inefficient. If the original content everybody is looking for is a video, then that video is downloaded to each person who asks to see it, if you and your neighbors all decide to watch the same video, then it is downloaded individually to each one of you and will be sent through the Internet many times.
The Pursuit proposal is suggesting that we instead change the Internet to use URIs (Universal Resource Identifiers) to search for data. This kind of search is going to look for the content you are looking for rather than for the server that originally stored the data. So if you are looking for a TV show, it will look to see where that show is currently stored. If somebody in your local network has recently watched that show then the data is already available locally and you will be able to download it much faster and also not have to initiate a new download from the original server.
This is somewhat akin to the way that file-sharing sites work and you might be given a menu of sites that hold the data you are looking for. By choosing the nearest site you will be retrieving the data from somewhere other than the original server. The closer it is to you (network-wise, no geographically) the faster and more efficiently you will be able to retrieve it.
But more likely the retrieval will be automated, and you may download the content from many locations – grabbing a piece of the desired video from the different networks that currently hold that data.
This is not a new concept and networks that use switched digital video have been using the same concept. In those systems, the first person in a neighborhood node that watches a certain channel will open up a connection for that channel. But the second person then shares the already-open channel and does not initiate a new request back to the TV server. This means that a given channel is opened only once for a given node on the network.
There are huge advantages to this kind of shift in the Internet. Today the vast majority of data being sent through the Internet is video. And one has to imagine that very large numbers of people watch the same content. And so changing to a system where a given video is sent to your local node only one time is a huge improvement in efficiency. This is going to take the strain off of content servers and is also going to relive a lot of the congestion on the Internet backbone. In fact, once the data has been dispersed the Internet the original server could be taken out of service, but the content will live on.
There are some downsides to this kind of system. For example, one often hears of somebody pulling down content that they don’t want viewed any longer. But in an information-centric network it would not matter if data is removed from the original server. As long as somebody was recently watching the content it would live on, independent of the original server.
There are a lot of changes that need to be made to make a transition to an information-centric web. This is going to take changes to the transport, caching systems, error control, flow controls and other core processes involved in retrieving data. But the gigantic increase in efficiency from this change means that it is inevitable that this is going to come to pass.