Re: Option 1 - HTTP 303 Re: Towards a TAG consideration of CURIEs from Richard Cyganiak on 2007-04-10 (public-xg-mmsem@w3.org from April 2007)

From: Richard Cyganiak <richard@cyganiak.de>
Date: Tue, 10 Apr 2007 17:36:14 +0200
To: Alan Ruttenberg <alanruttenberg@gmail.com>
Cc: Tim Berners-Lee <timbl@w3.org>, Misha Wolf <Misha.Wolf@reuters.com>, www-tag@w3.org, Semantic Web <semantic-web@w3.org>, public-xg-mmsem@w3.org, newsml-g2@yahoogroups.com, Jonathan Rees <jar@creativecommons.org>
Message-Id: <28A57AE6-879F-4555-9525-5FFC87D4104D@cyganiak.de>

Alan,

On 10 Apr 2007, at 15:10, Alan Ruttenberg wrote:
>>> -1 for content negotiation. (very un-semantic web like to have  
>>> non-inspectable communications, and confusion about what a  
>>> name=URI means)
>>
>> Not a fair characterization of conneg. It would be implemented by  
>> having the (required!) 303 redirect point to either an HTML or RDF  
>> document based on client preferences. As Chimezie says, the HTML  
>> and RDF documents should point to each other.
>
> "should" and "do" are not coincident. It would be desirable if the  
> architecture made it easier to do what is correct, and harder to do  
> what is not.

Fair enough.

>> Thus all communications are inspectable, and it's clear what the  
>> names mean -- there's one for the domain object, one for the HTML  
>> document, one for the RDF document.
>
> They are not inspectable as RDF, or to a SPARQL endpoint. I don't  
> count running parsing html headers as "inspectable" from a SW point  
> of view.

Add a triple to the RDF document that connects the domain object with  
the HTML page. (foaf:page or rdfs:isDefinedBy work for me.) An RDF  
client dereferences the domain object's URI, gets 303-redirected the  
RDF, and learns from the RDF where to find HTML if it is needed. To  
me this seems more inspectable than your alternative suggestion,  
using RDF/XML plus a stylesheet, where there isn't even a URL for the  
resulting human-readable view.

> Can you send me a pointer to a deployed example of this?

DBpedia
Semantic MediaWiki
D2R Server
Geonames
ECS Southampton website

Just off the top of my head.

> For example I am terminally confused by FOAF. What does the name   
> "http://xmlns.com/foaf/0.1/" refer to?

That's the FOAF vocabulary specification. A document, since GET  
returns 200. It's available in HTML format, and may or may not be  
available in other formats.

> What does this mean (from foaf rdf)?   <rdfs:isDefinedBy  
> rdf:resource="http://xmlns.com/foaf/0.1/"/>

The subject is defined by the FOAF vocabulary specification.  
rdfs:isDefinedBy doesn't constrain the defining document to any  
particular format.

> How does a SW agent get the rdf for http://xmlns.com/foaf/0.1/ 
> Organization ?

It can't get the RDF since the FOAF folks have do neither GRDDL nor a  
<link> header nor content negotiation. FOAF gets away with this  
because everybody has to support FOAF, and so everybody just  
hardcodes the URL to their RDF.

> Here's a random example of rdf content negotiation that I googled  
> for - typical confusion over names.
>
> http://simile.mit.edu/issues/browse/PIGGYBANK-9
>
> wget http://crschmidt.net/julie/doap --header='Accept: text/html'
> --08:57:35--  http://crschmidt.net/julie/doap
>            => `doap.2'
> Resolving crschmidt.net... 64.92.170.181
> Connecting to crschmidt.net|64.92.170.181|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 3,160 (3.1K) [text/html]
>
> wget http://crschmidt.net/julie/doap --header='Accept: application/ 
> rdf+xml'
> --08:56:21--  http://crschmidt.net/julie/doap
>            => `doap'
> Resolving crschmidt.net... 64.92.170.181
> Connecting to crschmidt.net|64.92.170.181|:80... connected.
> HTTP request sent, awaiting response... 200 OK
> Length: 1,540 (1.5K) [application/rdf+xml]

A document that is available in two formats -- where's the confusion?  
The server sends along correct Content-Location headers that tell the  
client that the actual content is to be found at doap.xsl.html and  
doap.rdf, respectively. Inspectable (at the HTTP level), and clear  
names for the multi-format and HTML-format and RDF-format documents.

I'll try to summarize a bit. If you want to serve appropriate content  
to both humans and machines, you have these options (ignoring hash  
URIs for the moment):

1. use content negotiation and 303-redirect from the domain object's  
URI to HTML or RDF documents basend on the Accept header

2. 303-redirect from the domain object's URI to an HTML document and  
have a <link> header pointing to the RDF document

3. 303-redirect from the domain object's URI to an HTML document and  
use GRDDL to extract RDF

4. 303-redirect from the domain object's URI to an RDF document and  
have a CSS or XSLT stylesheet for a human-readable view

My preference is exactly in this order.

1 is more work on the server side, but serves both kinds of clients  
perfectly.

2 is conceptually simple and easy to implement for publishers and  
clients, but involves sending the HTML to SW clients even though just  
one line of it is needed. I don't think there's anything wrong with  
SW clients reading <link> headers out of HTML pages. After all, SW  
clients are *Web* clients first. I'm quite certain that *all* of them  
will eventually be able to extract and follow <link> headers from  
HTML. Heck, it's about three extra lines of code.

3 is fine if all required information is present in the HTML.

4 does not address the needs of traditional search engines, and  
producing high-quality pages from RDF with just CSS or XSLT is very  
hard, though it serves SW clients perfectly.

If you use hash URIs, you can do away with the 303 redirects, which  
makes 2 and 3 even more appealing IMO.

Cheers,
Richard

>
> -Alan
>

Received on Tuesday, 10 April 2007 15:36:39 UTC