Re: Change Proposal for HttpRange-14 from Hugh Glaser on 2012-03-24 (public-lod@w3.org from March 2012)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sat, 24 Mar 2012 11:39:14 +0000
To: Jeni Tennison <jeni@jenitennison.com>
CC: public-lod community <public-lod@w3.org>
Message-ID: <EMEW3|c6a1dbcc90044c46c5009a2e20c57631o2NBdd02hg|ecs.soton.ac.uk|3C6FCE34-FCD4->
Many thanks.
I'm pleased I already put my name on the list then :-)
And for me some useful fleshing out of how things would/will work.

No further comments inline.
Best
Hugh

On 24 Mar 2012, at 11:17, Jeni Tennison wrote:

> Hi Hugh,
> 
> On 24 Mar 2012, at 10:02, Hugh Glaser wrote:
>> Please can you clarify something for me?
>> (I am not very good at reading these formal documents - a bear of little brain, perhaps.)
> 
> I will try my best.
> 
>> Am I right in thinking that, under your Change Proposal, the following sort of thing becomes possible (I hope I am getting it right).
>> Taking a site such as myexperiment.org (but it could very easily by the eprints software, BBC, or even dbpedia.)
>> See http://www.myexperiment.org/workflows/16
>> A huge barrier to adoption of LD for them was that their users would be exposed to the intricacies of the different URIs, and in particular that if myexperiment.org moved over to using LD URIs completely, users would not be able to cut and paste them from the address bar etc..
>> Great confusion would ensue, especially as their workflows already offered XML in addition to the HTML.
> 
> Right.
> 
>> This was a Bad Thing for them - their users were only just coming to terms with all this online workflow stuff, and could easily get spooked.
>> They nearly didn't do it, but because many of their technology providers were Linked Data people, it went ahead (a few years ago now).
>> The current outcome is what you see at the bottom of the workflow page - a panel offering the different URIs, with a link to a page describing the Linked Data world (to Chemists), which they are expected to understand.
>> (Hash URIs might have been a bit better, but introduced a different mechanism from the XML.)
> 
> Yep.
> 
>> As a result of your Change Proposal, it would have been acceptable (*if they wanted*), to simply add RDF as a Content Negotiation option, and deliver an RDF document with 200, in response to -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16, just as they did for XML, I think.
>> And this would enable them to use http://www.myexperiment.org/workflows/16 as the anchor throughout the site (as they do) and have the same URI in the address bar, and in fact have http://www.myexperiment.org/workflows/16 as the only thing users see.
>> Is that right?
> 
> Yes. They could have used http://www.myexperiment.org/workflows/16 throughout the site, had it respond with a 200 based on conneg with either HTML or RDF as required. It wouldn't have taken a linked data expert to figure out that if they wanted to refer to the workflow they had to copy and paste from the box at the bottom of the HTML page rather than the location bar at the top from you which you usually copy and paste URIs.
> 
> They could also (as they are doing) had separate URIs for the individual formats like:
> 
>  http://www.myexperiment.org/workflows/16.html
>  http://www.myexperiment.org/workflows/16.rdf
>  http://www.myexperiment.org/workflows/16.xml
> 
> They could have included within the RDF that you got from http://www.myexperiment.org/workflows/16 statements of the form:
> 
>  <http://www.myexperiment.org/workflows/16> 
>    wdrs:describedby <http://www.myexperiment.org/workflows/16.html> ;
>    wdrs:describedby <http://www.myexperiment.org/workflows/16.rdf> ;
>    wdrs:describedby <http://www.myexperiment.org/workflows/16.xml> ;
>    .
> 
> This would have enabled them to make separate statements about the licensing and provenance of the information held in those documents. If they didn't want to make those kinds of statements or enable those formats to be individually addressable, they could have just supported the http://www.myexperiment.org/workflows/16 URL and used conneg.
> 
>> Apropos Doing It Wrong:
>> It is interesting to note that I see myexperiment.org have made the practical decision to 303 to the RDF from 
>> curl -i -L -H Accept:application/rdf+xml http://www.myexperiment.org/workflows/16.html
>> which suggests that they are already subverting things to get round some sort of problem.
> 
> It looks as though it's:
> 
>  http://www.myexperiment.org/workflows/16.html
>  -> 301 -> http://www.myexperiment.org/workflows/16
>  -> 303 -> http://www.myexperiment.org/workflows/16.rdf
>  -> 200
> 
> Technically I think, per http://www.w3.org/2001/tag/doc/uddp/#idp439264 this should mean that you can infer http://www.myexperiment.org/workflows/16.html sameAs http://www.myexperiment.org/workflows/16 but I'm not 100% sure what's intended (I think this needs spelling out).
> 
>> Few sites I can find (apart from dbpedia) actually return 406 when you ask the HTML URI for RDF: they usually return the HTML.
>> It is a foolish agent that relies on RDF coming back from a 200 OK when it has asked for application/rdf+xml.
> 
> Yes.
> 
>> Apropos Risk.
>> You say there is no risk.
>> Is this a risk?:
>> There may be a serious increase in the number of URIs for current sites.
>> 
>> Taking Freebase as another example.
>> (In fact any of these sites that have worked hard to conform to the current regime will have a decision to make.)
>> Presently, if I
>> curl -i -L -H Accept:application/rdf+xml http://www.freebase.com/view/en/engelbert_humperdinck
>> it gives me back HTML.
>> What will it do in future?
>> I know this Change Proposal is not proposing that they need to change, but will they?
>> They already have http://rdf.freebase.com/ns/en.engelbert_humperdinck (and http://rdf.freebase.com/ns/m.047vj6 and another longer one).
>> Effectively http://www.freebase.com/view/en/engelbert_humperdinck becomes yet another URI that people can use, since it would return RDF (as myexperiment).
>> Obviously I am viewing this a bit from the sameAs.org viewpoint.
>> I know that the resource in the RDF document will (should) never be the HTML URI, but people can and possibly will start passing around the HTML URI as if it was the "proper" URI, and so a sensible sameAs service would have it as a way of looking up the "proper" URIs.
>> In fact I have sometimes toyed with the idea of allowing look up by HTML URL on sameAs.org (giving back only the "real" Linked Data URIs) - it is what a user expects from such a query, after all.
>> (I hope all that makes sense.)
> 
> I guess I don't quite see the distinction that you're making between "HTML URIs" and "proper" URIs. Perhaps that's because I've become too embedded in the world where RDF data is embedded within HTML pages. I think that where Jonathan's document says [1]:
> 
>  A "URI documentation carrier" for a URI is a representation that carries 
>  URI documentation that bears on the meaning of that URI. Applying the 
>  adjective "nominal" is a technicality that signifies that being a URI 
>  documentation carrier for the URI is expected according to this 
>  specification, but that it might not actually be one (for example, the 
>  representation might be empty, or it might contain information, but not 
>  information that helps to document the URI, perhaps as the result of a 
>  mistake).
> 
> what he's trying to tease out is the fact that you might not get any data back about a particular URI when you request that URI, but what you do get back is still its (empty) documentation. The URI doesn't become meaningless just because you get nothing back; it doesn't mean others can't make statements about it.
> 
> So in my view we're already living in a world where those "HTML URLs" exist and are meaningful and a sameAs service could be making statements about them.
> 
> Sorry, I'm probably missing something.
> 
> Where well-behaved sites will have to make a decision is whether to continue to use a 303 or switch to using a 200 and including a 'describedby' relationship. For example, we at legislation.gov.uk might be seriously tempted to switch to returning 200s from /id/ URIs. Currently, anyone requesting an /id/ places a load on our origin server because the CDN can't cache the 303 response, so we try to avoid using them in links on our site even where we could (and really should). Consequently people referring to legislation don't use the /id/ URIs when what they are referring to is the legislation item, not a particular version of it. If we switched to a 200, we wouldn't have to avoid those URIs, which would in turn help us embed RDFa in our pages, because instead of having a reference in a footnote contain something like:
> 
>  <a rel="leg:references" 
>     resource="/id/ukpga/1985/67/section/6"
>     href="/ukpga/1985/67/section/6">1985 c. 67 s. 6</a>
> 
> we could just use:
> 
>  <a rel="leg:references" 
>     href="/id/ukpga/1985/67/section/6">1985 c. 67 s. 6</a>
> 
> but none of this increases the number of URIs that we're using, it just makes us switch to referring to legislation items using the URIs that we'd designed to be used to refer to legislation items.
> 
> Cheers,
> 
> Jeni
> 
> [1] http://www.w3.org/2001/tag/doc/uddp/#carriage
> -- 
> Jeni Tennison
> http://www.jenitennison.com
> 

-- 
Hugh Glaser,  
             Web and Internet Science
             Electronics and Computer Science,
             University of Southampton,
             Southampton SO17 1BJ
Work: +44 23 8059 3670, Fax: +44 23 8059 3045
Mobile: +44 75 9533 4155 , Home: +44 23 8061 5652
http://www.ecs.soton.ac.uk/~hg/
Received on Saturday, 24 March 2012 11:44:19 UTC