Re: Address Bar URI from Michael Smethurst on 2011-10-18 (public-lod@w3.org from October 2011)

From: Michael Smethurst <michael.smethurst@bbc.co.uk>
Date: Tue, 18 Oct 2011 10:57:39 +0100
To: Bernard Vatant <bernard.vatant@mondeca.com>
CC: Linking Open Data <public-lod@w3.org>
Message-ID: <CAC30C23.29189%michael.smethurst@bbc.co.uk>
Hi Bernard

Glad to hear I¹m finally making sense to someone... :-/

What you said. Only additions would be:

> The first URI is used in RDF descriptions of the thing, that I get for example
at http://example.org/resource/foo.rdf

For completeness: and / or in rdfa at http://example.org/resource/foo.html
:-)

> The second URI is not used in the RDF descriptions whatsoever. It's a webby
trick enabling easy copy-paste, caching, display in address bar, whatever deal
with Web conversation only interested in information resources. It's my IR proxy
to 1.

It could be used in the rdf but not to talk about Œthings¹. But for
something like seeAlso to point to more information (which might happen to
be available as rdf). It¹s kind of a trick but it¹s a common trick many
publishers are already doing regardless of whether they publish linked data
(so links their can be shared)

> The conneg for 1 is a systematic 303 to 2, whatever the query.

I guess you need to check that nir is a ³thing² known to your system before
303ing but yes. Tho I wonder if ³conneg² is the right label for that?!?

> The conneg for 2 indirects to the desired type of representation.

Yes, and the representation URblah (.html, .rdf, .json) is only exposed in
content location headers (unless forced by the user into the address bar).
Which was your indirects. But just to be clear :-)

===

A couple more thoughts to save me the trouble of writing a blog post:

I think (and I might be wrong) that some linked data people see conneg (in
the accept header sense) as being a peculiarity particular to linked data.
But it's no more a linked data peculiarity than HTTP

Because it's seen as a peculiarity it tends to get lumped in with the usual
linked data talking points around http-range-14 and 303s. And because it
gets talked about as one thing it tends to get implemented as one thing

But http-range-14 / nir and conneg are doing completely different jobs. The
first one is just about saying, "this thing i've been talking about can't be
sent down the wires but here's some information." And the second is about
sending back a representation that's appropriate to the needs of the user
(as specified in their accept headers). Or saying, "Sorry, I don't have /
can't generate a representation that suits your needs" (406). (Again, in our
case, with some messy device detection to cope with feature phones and smart
phones and twonkPads and laptops and possibly TV set top boxes). There¹s a
real separation of concerns that a lot of linked data publishers aren¹t
acknowledging. Which imo is just storing up trouble for the future

All of the problems mentioned in this thread could be solved with the
addition of a *generic* information resource URI that does the conneg
separately from the 303. Target the *generic* information resource in your
links and expose that in the address bar, keep the details of the specific
representation URL tucked away in content location headers and just use the
non-information resource as something to talk about. So you don't split the
URIs you expose to the web and don't bounce every request through a 303 and
don't need to use replaceState to replace the representation URL with
something more sharable

In the absence of a generic information resource URI you've only got two
choices about what ends up in the address bar: the NIR URI or the specific
representation URL. IMO it should be none of the above. The latter breaks
sharing and the former doesn¹t make sense

Also to note that the dbpedia publishing pattern is problematic for
consumers as well as publishers [1]. NOTE: it's not the 303 that's actually
harmful here; it's the lack of a *generic* information resource URI that
leads to being constantly and unnecessarily bounced through a 303 for every
request

Have to say that if we had implemented linked data following the dbpedia
pattern and exposed a URL per serialisation / language in the address bar /
to the web AND made our content unshareable AND inadvertently caused a 303
hit for every request to bbc.co.uk... we'd probably have lost our jobs by
now. And I tend to consider anything that loses me my job an anti-pattern
:-/




ps. Talking about dbpedia URIs I should probably also bring up the more
harmful problem. Basing dbpedia URI slugs on wikipedia URI slugs which are
in turn based on wikipedia page titles means URIs change every time someone
changes the wikipedia page title. Which is definitely *the* major problem
when working with dbpedia. Every time I see the LOD cloud diagram with all
those links pointing to dbpedia I wonder how many of those links will still
work today / tomorrow / etc. Is there any likelihood of dbpedia moving to /
supporting something more dbpedia lite [2] like with URI slugs based on
wikipedia row numbers (which we're told are guaranteed stable)? Probably a
question for another thread...

[1] http://nevali.net/post/11228142010/303-considered-harmful
[2] http://dbpedialite.org/


On 18/10/2011 09:51, "Bernard Vatant" <bernard.vatant@mondeca.com> wrote:

> Hi Michael
> 
> Let me try to write down your case as I understand it, trying to avoid
> Capitalized Buzzwords ;-)
> Seems a good idea to me, although it introduces yet another level of
> indirection in the picture, but maybe we need it.
> 
> We have three different types of animals to identify by URI
> 
> 1. Something known as 'foo' in the "real" (or not) world :
> http://example.org/thing/foo
> 2. A generic information resource binding the various representations of 'foo'
> on my server(s) : http://example.org/resource/foo
> 3. Representations/renderings of 'foo' in various formats (html, rdf, xml,
> json, ...) / languages etc : http://example.org/resource/foo.html
> 
> The first URI is used in RDF descriptions of the thing, that I get for example
> at http://example.org/resource/foo.rdf
> The second URI is not used in the RDF descriptions whatsoever. It's a webby
> trick enabling easy copy-paste, caching, display in address bar, whatever deal
> with Web conversation only interested in information resources. It's my IR
> proxy to 1.
> 
> The conneg for 1 is a systematic 303 to 2, whatever the query.
> The conneg for 2 indirects to the desired type of representation.
> 
> Using 2 in Web dialogue avoids confusion : the URI in the browser is not
> misleading. You've asked for an IR, here it is, and in the format you've
> asked. 
> 
> Do I get your point correctly?
> 
> Bernard
> 
> 2011/10/18 Michael Smethurst <Michael.Smethurst@bbc.co.uk>
>> Hi Richard
>> 
>> (Again top post courtesy of webmail. sorry)
>> 
>> I'm saying dbpedia is missing the concept of a *generic* information resource
>> URI and it's that URI that should show up in the address bar and be used in
>> link targets. Ignoring the linked data aspect for a moment if you publish
>> your data in various serialisations like:
>> 
>> - /foo.html
>> - /foo.xhtml-mp (mobile profile xhtml for feature (non-smart) phones)
>> - /foo.json
>> - /foo.xml
>> 
>> you want to allow people to copy and paste the address bar into email /
>> twitter etc and for someone clicking the resulting link to get back an
>> appropriate representation (depending on their accept headers + a bit of
>> messy device detection in the case of the html and xhtml-mp)
>> 
>> So you need a generic IR URI that does the conneg / device detection and
>> sends back the appropriate serialisation without a redirect. The generic IR
>> URI (/foo) stays in the address bar and the full location (/foo.json etc) is
>> only exposed in the content location header (not in the address bar)
>> 
>> All links then target the generic IR resource (not the NIR and NOT the
>> specific representation (.html etc))
>> 
>> So link targets are to generic ir uri and the address bar always shows the
>> generic ir uri. Which gives you two benefits:
>> - you only expose one set of uris to crawlers (google etc)
>> - the uri in the address bar becomes universally sharable with copy + paste
>> 
>> It's reasonable / necessary to expect publishers to take a conneg / device
>> detection hit for every request because you want your content shared and the
>> ability to send back an appropriate representation and it's all nicely
>> cachable (even in cdn mode) with varies
>> 
>> It's not reasonable / necessary to expect publishers to take an uncachable
>> 303 hit for every request
>> 
>> When you start writing rdf you just need the ability to talk about something
>> that can't be sent down the wires. So you add in the nir uri. If someone
>> requests the nir then:
>> 
>> nir > 303 > *generic* ir > conneg > ir representation (url only exposed as
>> location header)
>> 
>> lots of linked data seems to do the 303 and conneg as one step but they're
>> not happening for the same reason. the job of the conneg is to return an
>> appropriate representation from the ir; the job of the 303 is to say "i can't
>> send you that but here's some information that will hopefully be useful".
>> conneg is needed regardless of whether you're doing linked data and linked
>> data only adds in the 303 when the nir is requested. i think the two steps
>> tend to get conflated in linked data publishing patterns and we should
>> attempt to separate them
>> 
>> hth
>> michael
>> 
>> 
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Tuesday, 18 October 2011 09:57:53 UTC