W3C home > Mailing lists > Public > semantic-web@w3.org > June 2007

Re: What if an URI also is a URL

From: Richard Cyganiak <richard@cyganiak.de>
Date: Thu, 7 Jun 2007 16:54:46 +0200
Message-Id: <3DE1E691-AAF8-4738-911D-DA9F238E2B4F@cyganiak.de>
Cc: Tim Berners-Lee <timbl@w3.org>, semantic-web@w3.org, "Lynn, James (Software Escalations)" <james.lynn@hp.com>
To: r.j.koppes <rikkert@rikkertkoppes.com>

On 6 Jun 2007, at 20:51, r.j.koppes wrote:
> But if, on the web page http://www.example.com/mophor there is a  
> section with id "me", how do I refer to that particular section in  
> the web page in a RDF document (which might contain anything, even  
> unrelated to me as a person)? How do I make sure that the reader  
> (machine / human) interprets this reference as being a web location  
> (fragment in web page) instead of the thing, me.

The answer is simple: If an HTTP GET on http://www.example.com/mophor  
returns a 200 status code and some HTML, then mophor#me is the  
section of the HTML page marked by the #me anchor.

If the GET returns a 200 status code and some RDF, then mophor#me is  
whatever the RDF states about it. So if there is a triple in the RDF  
saying it is a person, then mophor#me is a person.

If the GET returns either RDF or HTML, depending on content  
negotiation, then you're in trouble because <mophor#me> is now  
ambiguous and clients are unable to consistently interpret the URI.  
So, as Sandro said, Don't Do That, or use the magic 303 redirect.

Here's a long explanation that follows the RFC paper trail:

1. The URI spec (RFC3986) says that the meaning of <foo#bar> depends  
on what <foo> is. And what <foo> is depends on the scheme of the URI.

2. Now we can look up the scheme in the IANA URI scheme registry [1].  
It will tell you the RFC governing that URI scheme. In the case of  
the http: scheme, that's RFC2616.

3. RFC2616 says: To find out what <foo> is, do an HTTP GET on it.  
There will be a Content-Type header in the response telling you the  
MIME type of <foo>.

4. Now look it up in the IANA MIME type registry [2]. It will tell  
you the RFC governing that content type.

5. Now check that RFC, it will tell you what <#bar> means within a  
document of that type.

5a. In the case of text/html, the relevant RFC is RFC2854 and it says  
<#bar> is a part of the document identified by an anchor.

5b. In the case of application/rdf+xml, the relevant RFC directs us  
towards the RDF Concepts spec. That spec basically says that <#bar>  
is whatever the RDF document claims it is, and reminds us that it  
could be anything at all, including things external to the RDF  
document, like a person.

The point of this little exercise: You can arrive at a consistent and  
unambiguous interpretation of a URI just by reading the URI spec and  
then following the paper trail through the RFCs. Not all of it is  
widely implemented, e.g. today's browsers are still oblivious to RDF.  
And some parts are are not even in the specs yet, e.g. the W3C TAG's  
httpRange-14 compromise which should lead to a revision of the HTTP  
spec at some point in the future. But in general, the answers are in  
the specs, if you're willing to dig.


[1] http://www.iana.org/assignments/uri-schemes.html
[2] http://www.iana.org/assignments/media-types/

> Rikkert Koppes
> Tim Berners-Lee wrote:
>> On 2007-06 -06, at 13:13, r.j.koppes wrote:
>>> Ok, herby a follow-up to the semantic-web list.
>>> To summarize:
>>> Me: suppose I am identified by http://www.example.com/mophor and  
>>> there is also a webpage http://www.example.com/mophor...
>>> Tim: this is an error, by returning a 200 for the webpage, it is  
>>> identified, so these are two different things. http:// 
>>> www.example.com/mophor#me would be ok
>>> James: but what about fragment identifiers?
>>> Tim: no problem, since the client strips off fragment  
>>> identifiers, so accessing the web page http://www.example.com/ 
>>> mophor#me would identify http://www.example.com/mophor as a  
>>> webpage by returning a 200 (this is my interpretation of what is  
>>> said)
>> Woa.  Stop. No.   You can't access < http://www.example.com/ 
>> mophor#me> as it isn't a web page.
>> The function 'access web page' takes a URI with no hash.
>> The fact that the id http://www.example.com/mophor#me is used at  
>> all indicates that  "http://www.example.com/mophor" identifies a  
>> document, before you even think of access it.
>> Because the "foo#bar" means   "the thing identified by the local  
>> id bar within foo" in the web architecture.
>> You can look up < http://www.example.com/mophor#me> which means,  
>> on the CLIENT, stripping off the "#me"
Received on Thursday, 7 June 2007 15:09:51 UTC

This archive was generated by hypermail 2.4.0 : Tuesday, 5 July 2022 08:45:01 UTC