RE: URIs, deep linking, framing, adapting and related concerns from Rotan Hanrahan on 2010-12-17 (www-tag@w3.org from December 2010)

From: Rotan Hanrahan <rotan.hanrahan@mobileaware.com>
Date: Fri, 17 Dec 2010 17:04:06 -0000
To: <www-tag@w3.org>
Cc: "Mark Baker" <distobj@acm.org>, "Jonathan Rees" <jar@creativecommons.org>, Martin J. Dürst <duerst@it.aoyama.ac.jp>
Message-ID: <D5306DC72D165F488F56A9E43F2045D302A8D344@FTO.mobileaware.com>

One of the concerns I had when I originally started this thread, a long time ago, was the distinction between explicit and implicit intentions associated with the use of URIs. This was prompted by yet another argument on the rights and wrongs of "deep linking", and a growing presence of (often unsolicited) intermediaries that were supposedly providing a service (auto-adapting content to fit a mobile browser) but were also generating revenue from advertising injected into the server-client data stream.

Does the fact that a resource has been given a URI imply that the resource is available for public use? Can I link to it directly (i.e. can I utter the URL in public)? Is the data I receive when I dereference the URI mine to with as I please?

Before jumping in with the obvious answers (which might not be as correct as we think), it pays to consider exactly what roles the URI and HTTP are playing. I won't reiterate here the analogy I used in my original message, but I will try to summarise how I currently think about this.

First, the URI is merely an identifier, typically structured to indicate how the identified resource can be dereferenced (accessed). It's a unique name. Unlike the use of the name of the Supreme Being, there is no prohibition on the utterance of a name, lest that name be concocted to resemble an utterance of something that is not a name (e.g. a threat of violence, which in some jurisdictions is an offence). In general, we should be willing to accept that a name can be uttered unrestricted.

However, we also admit that names cannot be uttered in a context such that a new more complex, false statement is being uttered, presumably to defraud or otherwise mislead. For example, I may be able to utter the name of "SuperMega Cola" but I cannot justify saying "Scientists have proven SuperMega Cola is poison" since this may be false and damaging to the manufacturer of said cola/poison. It would be equally wrong for me to express this statement as "The makers of <img src="...supermegacola.png"/> are thieves", wherein I use the address of an image from the SuperMega Inc site. It is quite clear from this example that it is an abuse of an image in which ownership/copyright is held by someone else.

It is perfectly OK for an image (representation) to have a URI. The problem lies with the usage of the representation, an infringement of rights that would exist regardless of the technology. Here's an odd example: Imagine a truck carrying a billboard with a hole in the middle, an arrow pointing to the hole, the word "Liar" next to the arrow and then this truck driving past the White House, Number 10 or wherever, during a televised press conference. On its own in a neutral context the hole in the billboard does no wrong, but in these very special circumstances it has a very different meaning. It is neither the URI concept nor the HTTP technology that is responsible for linking/framing/transclusion issues. They are merely another opportunity for people to do wrong, in the same way that people can find nasty ways to use holes.

On the other hand, anyone skilled in the Web arts will know that an openly dereferenceable URI is open to abuse and so one might expect that they would take precautions against such abuse. A simple check of the referer (sic) header will suffice in most cases. For added security, a challenge for credentials to create a secure session could be used. These approaches will also prevent deep linking by requiring that the resource is only available after certain prerequisites have been fulfilled.

Since these mechanism are very widely known, is it reasonable to expect that they are (or will be) adopted? The presence of such protection explicitly says that the resources are being protected, and it is clear that any attempt to circumvent the protection is an extraordinary measure that any decent court would find to be infringing of the author/owner's rights. Essentially, the use of an additional layer of security is concrete evidence of explicit intention to restrict the access and use of the resource identified by the URI. The mechanisms both protect, and signal intent to protect, and it is reasonable to expect them to be adopted. But we hear of many complaints of deep-linking, framing etc. from high profile sites, so perhaps the expectation is unwarranted. Somebody needs to advise that the mechanisms are used. (And perhaps the TAG could be the giver of such advice?)

What if the protection mechanisms are not present? This is where implicit intention comes in. Does the absence of well-established protection mechanisms imply that the resource is freely available? Or does it simply show rare ignorance by the site owner? (Or complacency perhaps?) Even in this case, I think the laws of the land would still apply to the use of the identified resource, so the absence of protection is not a license to do as you please.

But what of deep linking to an unprotected URI? Here one has not misrepresented the resource owner. Nor has one impacted upon their bandwidth to reduce your own costs. Indeed, by getting people directly to the resource one has probably reduced the traffic to the other site. (Yes, I accept that the other site probably wants increased traffic from the user, as this may involve a certain beneficial workflow.) In the absence of any evidence to the contrary, surely such links are permitted? In fact, without such an abundance of links, the Web would be a feeble shadow of itself with everyone afraid to include paths of any kind in their external links.

There are very good reasons to provide direct/deep links. Not least is the need for smooth Web navigation, which was a key benefit of the Web that set it apart from FTP, Gopher and so on. There are other good reasons for direct links. For example, in the mobile context it is generally considered a good idea to avoid HTTP redirections, so it would be of particular benefit to very busy sites (e.g. mobile.walmart.com) to have a direct link, instead of going via the redirects on the www site, and I'm sure such enterprises would not complain in this case.

Compare this to placing a poster on a publically visible wall. By putting it in a place that can easily be found, one is essentially revealing one's intent to make the poster public. Protection mechanisms are possible, such as representing (encrypting) the image as 3D which can only be seen properly (decrypted) with 3D glasses, which for the one-eyed minority is quite a nuisance. But generally a public wall is just a public wall, and it should not matter if you arrive at this spot on the wall by walking, driving or parachute. (You probably didn't anticipate the parachute, but this is exactly the situation that pertains for those who feel they are victims of deep-linking.)

What did I expect when I presented the conundrum to the TAG? I was hoping for some simple guidance along the lines of a statement that URIs are merely the identifiers for representations of resources, while the protection of such resources (access, usage etc.) is a different matter for which there are numerous technical solutions such as HTTP sessions with credentials. In the absence of any protection mechanism, making the resource representation available via the URI is akin to open publication. Subsequent use of such published resource representations is out of scope for the Web architecture. It is a separate issue for the prevailing legal jurisdictions to deal with such use, and would cover issues such as plagiarism, defamation, interference with legitimate business activity etc.

That's the kind of response I expected the TAG to produce, modulo several nuances I had not anticipated.

As for the matter of (uninvited) proxy manipulation of resource representations during dereferencing (via HTTP), that is a particularly thorny issue that still needs a lot of consideration, and unfortunately I don't have time to elaborate now.

I don't know if I've clarified or muddied things, but it's good to see the debate come alive.

---Rotan.

Received on Friday, 17 December 2010 17:04:39 UTC