Re: FragIds in semantic web (ACTION-543) from Tim Berners-Lee on 2011-04-27 (www-tag@w3.org from April 2011)

From: Tim Berners-Lee <timbl@w3.org>
Date: Wed, 27 Apr 2011 13:50:36 -0400
To: Jeni Tennison <jeni.tennison@googlemail.com>
Cc: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org List" <www-tag@w3.org>
Message-Id: <9FBC0426-A8EA-42EF-BDA2-6AF937B073F1@w3.org>
On 2011-04 -06, at 18:18, Jeni Tennison wrote:

> 
>  Content negotiation becomes extremely difficult when the interpretation 
>  of fragment identifiers depends on the MIME type as there is no 
>  guarantee that the syntax of a fragment identifier that is legal for
>  one MIME type is also legal (or interpreted in an equivalent way) for
>  another MIME type. For example, the common `#identifier` syntax for
>  HTML is not consistent with the XPointer-based syntax defined for XML.

I am glad you brought this up, as may people look at it this way, which
I think is sort of backwards.  There is a sort of backwards argument which says

""" (incorrect argument) 1. And RDF graph in a document says what the URI means by giving stuff about it.
2. But the meaning of <foo#bar> is a function of the MIME type..
3. So the URI can't have any meaning until someone has accessed the URI!
4. Which contradicts 1, so the whole system is broken.""

The MIME type system is the flexibility point which allowed semantic web as
a system to be built on top of what I will call the raw web.  So, the fact that a new MIME type could
define a new way of thinking about what the foo#bar refers to was crucial.

If you like the raw web architecture is 

A1. If you get a reference to <foo#bar> you can find out more by looking up foo.
A2. When you look up foo by HTTP, you must mind the content-type
A3. The content-type spec directs you to parse and understand the contents.

The raw web doesn't give you much by itself.
The semantic web is built on top of it.

But, that said, once you are within the semantic web as a system, that flexibility point is
no more.   Once the user above has read the sem web specs, they 
understand RDF graphs and stuff. The semantic web specifies that 

B1. An foo#bar identifier can be allocated by the owner of foo to any Thing
B2. The server they responds to request for foo should give info about foo#bar
B3.  A number of languages are available for doing this, and they have MIME types
  and way of being parsed to logical statements (generally RDF graphs) 
B4. Where two serializations exist for the same data, it is good to provide both using conneg.
  but both languages (e.g. application/rdf+xml and text/n3) they both
 produce (generally the same) set of statements about <foo#bar>.

If you like, then the sem web architecture is built on top 
fo the web architecture, and both are in play, and that is perfect.
People in the semantic web community sometimes scorn the 
idea of A3, but they shouldn't, because that is the step by which they ascended to
the semantic web from the raw web.

You can of course look at the hypertext architecture as being also built
on A1-3, but differently

C1. A person can allocate a URI foo#bar to any anchor within document <foo>.
C2. A link in a document <baz> can contain a reference on a link to <foo#bar>
C3. When the user clicks on that link they will make a hypertext jump to <foo#bar>.
C4. B4. Where two serializations exist for the same document, it is good to provide both using conneg.
  but both languages (e.g. text/html and image/svg+xml) they both produce (generally)
the same import to the user.

So that is the (simplified) hypertext system.  It works because people stick to hypertext
languages like HTML and SVG for the documents <foo>.  Within hypertext, 
that MIME type flexibility point is no longer the great extension point, it is just
used for conneg between very equivalent types of document.



>  This is exacerbated in common semantic web practice, which not only 
>  makes heavy use of content negotiation but in which URLs with fragment 
>  identifiers are used to identify real-world Things. In these cases, 
>  the URI as a whole is used to identify the real-world Thing, and the
>  fragment identifier does not address a part of any entity, so 
>  interpreting the fragment identifier based on the MIME type of whatever 
>  entity happens to be returned does not make sense.

There is nothing in A1-3 which says that a <foo#bar> must be a fragment.
It was an unfortunate choice of "fragment" in " fragment identifier", Sorry.

It makes sense in a sem web system, because even though you have a system
which stores data about <foo#bar> and may have found a lot of data about <foo#bar> before
it ever looks up <foo>, when it looks up <foo> then is must use the content-type
to work out how to parse the bits coming back still.  If it gets an HTML page, and the id
is of an anchor, then this is mixing of two systems which has not be defined to work.

So I think the TAG needs to explain the above, perhaps going back over the AWWW
to make the separation between A and C clearer, and then introducing B.

If people want to mix the systems (and they do, with RDFa) then we have to explain how
that works.   

(I think it works by having a document which is both HTML and RDF and participates in both
systems and defines some RDF things and some anchors, and I think these things should not be confused
by having the same URI for both a thing and an anchor. We need to say that.)

We need to stop people letting connect go to seed, and building
systems where hypertext and data are muddled and not both accessible unless
on the client th e user can control the accept headers.
Conneg should only be used where the data is identical in all cases (or a subset due
 to the limitations of one of the languages).

Tim
Received on Wednesday, 27 April 2011 17:50:39 UTC