Re: FragIds in semantic web (ACTION-543)

Tim Berners-Lee wrote:
 > We need to stop people letting connect go to seed, and building
 > systems where hypertext and data are muddled and not both accessible unless
 > on the client th e user can control the accept headers.
 > Conneg should only be used where the data is identical in all cases (or a 
subset due
 >  to the limitations of one of the languages).

I'm probably missing something here, but in the context of the preceding text, 
esp. the distinction between cases B and C, this seems to discourage the 
common(?) practice of publishing both HTML and RDF at the same URI, selectable 
by conneg?

(My take has generally been that in such cases hypertext and RDF should 
generally describe the same things, for human- and machine- consumption 
respectively.)

#g
--


> On 2011-04 -06, at 18:18, Jeni Tennison wrote:
> 
>>  Content negotiation becomes extremely difficult when the interpretation 
>>  of fragment identifiers depends on the MIME type as there is no 
>>  guarantee that the syntax of a fragment identifier that is legal for
>>  one MIME type is also legal (or interpreted in an equivalent way) for
>>  another MIME type. For example, the common `#identifier` syntax for
>>  HTML is not consistent with the XPointer-based syntax defined for XML.
> 
> I am glad you brought this up, as may people look at it this way, which
> I think is sort of backwards.  There is a sort of backwards argument which says
> 
> """ (incorrect argument) 1. And RDF graph in a document says what the URI means by giving stuff about it.
> 2. But the meaning of <foo#bar> is a function of the MIME type..
> 3. So the URI can't have any meaning until someone has accessed the URI!
> 4. Which contradicts 1, so the whole system is broken.""
> 
> The MIME type system is the flexibility point which allowed semantic web as
> a system to be built on top of what I will call the raw web.  So, the fact that a new MIME type could
> define a new way of thinking about what the foo#bar refers to was crucial.
> 
> If you like the raw web architecture is 
> 
> A1. If you get a reference to <foo#bar> you can find out more by looking up foo.
> A2. When you look up foo by HTTP, you must mind the content-type
> A3. The content-type spec directs you to parse and understand the contents.
> 
> The raw web doesn't give you much by itself.
> The semantic web is built on top of it.
> 
> But, that said, once you are within the semantic web as a system, that flexibility point is
> no more.   Once the user above has read the sem web specs, they 
> understand RDF graphs and stuff. The semantic web specifies that 
> 
> B1. An foo#bar identifier can be allocated by the owner of foo to any Thing
> B2. The server they responds to request for foo should give info about foo#bar
> B3.  A number of languages are available for doing this, and they have MIME types
>   and way of being parsed to logical statements (generally RDF graphs) 
> B4. Where two serializations exist for the same data, it is good to provide both using conneg.
>   but both languages (e.g. application/rdf+xml and text/n3) they both
>  produce (generally the same) set of statements about <foo#bar>.
> 
> If you like, then the sem web architecture is built on top 
> fo the web architecture, and both are in play, and that is perfect.
> People in the semantic web community sometimes scorn the 
> idea of A3, but they shouldn't, because that is the step by which they ascended to
> the semantic web from the raw web.
> 
> You can of course look at the hypertext architecture as being also built
> on A1-3, but differently
> 
> C1. A person can allocate a URI foo#bar to any anchor within document <foo>.
> C2. A link in a document <baz> can contain a reference on a link to <foo#bar>
> C3. When the user clicks on that link they will make a hypertext jump to <foo#bar>.
> C4. B4. Where two serializations exist for the same document, it is good to provide both using conneg.
>   but both languages (e.g. text/html and image/svg+xml) they both produce (generally)
> the same import to the user.
> 
> So that is the (simplified) hypertext system.  It works because people stick to hypertext
> languages like HTML and SVG for the documents <foo>.  Within hypertext, 
> that MIME type flexibility point is no longer the great extension point, it is just
> used for conneg between very equivalent types of document.
> 
> 
> 
>>  This is exacerbated in common semantic web practice, which not only 
>>  makes heavy use of content negotiation but in which URLs with fragment 
>>  identifiers are used to identify real-world Things. In these cases, 
>>  the URI as a whole is used to identify the real-world Thing, and the
>>  fragment identifier does not address a part of any entity, so 
>>  interpreting the fragment identifier based on the MIME type of whatever 
>>  entity happens to be returned does not make sense.
> 
> There is nothing in A1-3 which says that a <foo#bar> must be a fragment.
> It was an unfortunate choice of "fragment" in " fragment identifier", Sorry.
> 
> It makes sense in a sem web system, because even though you have a system
> which stores data about <foo#bar> and may have found a lot of data about <foo#bar> before
> it ever looks up <foo>, when it looks up <foo> then is must use the content-type
> to work out how to parse the bits coming back still.  If it gets an HTML page, and the id
> is of an anchor, then this is mixing of two systems which has not be defined to work.
> 
> So I think the TAG needs to explain the above, perhaps going back over the AWWW
> to make the separation between A and C clearer, and then introducing B.
> 
> If people want to mix the systems (and they do, with RDFa) then we have to explain how
> that works.   
> 
> (I think it works by having a document which is both HTML and RDF and participates in both
> systems and defines some RDF things and some anchors, and I think these things should not be confused
> by having the same URI for both a thing and an anchor. We need to say that.)
> 
> We need to stop people letting connect go to seed, and building
> systems where hypertext and data are muddled and not both accessible unless
> on the client th e user can control the accept headers.
> Conneg should only be used where the data is identical in all cases (or a subset due
>  to the limitations of one of the languages).
> 
> Tim
> 
> 
> 
> 
> 
> 

Received on Thursday, 28 April 2011 09:11:59 UTC