Re: Change Proposal for HttpRange-14 from Michael Smethurst on 2012-03-27 (public-lod@w3.org from March 2012)

From: Michael Smethurst <michael.smethurst@bbc.co.uk>
Date: Tue, 27 Mar 2012 16:17:12 +0100
To: <tom.heath@talis.com>, Jeni Tennison <jeni@jenitennison.com>
CC: public-lod community <public-lod@w3.org>
Message-ID: <CB979888.2FFE2%michael.smethurst@bbc.co.uk>
On 26/03/2012 17:13, "Tom Heath" <tom.heath@talis.com> wrote:

> Hi Jeni,
> 
> On 26 March 2012 16:47, Jeni Tennison <jeni@jenitennison.com> wrote:
>> Tom,
>> 
>> On 26 Mar 2012, at 16:05, Tom Heath wrote:
>>> On 23 March 2012 15:35, Steve Harris <steve.harris@garlik.com> wrote:
>>>> I'm sure many people are just deeply bored of this discussion.
>>> 
>>> No offense intended to Jeni and others who are working hard on this,
>>> but *amen*, with bells on!
>>> 
>>> One of the things that bothers me most about the many years worth of
>>> httpRange-14 discussions (and the implications that HR14 is
>>> partly/heavily/solely to blame for slowing adoption of Linked Data) is
>>> the almost complete lack of hard data being used to inform the
>>> discussions. For a community populated heavily with scientists I find
>>> that pretty tragic.

No data here I fear; merely anecdote. But anecdote is usually the best form
of data :-)
>> 
>> 
>> What hard data do you think would resolve (or if not resolve, at least move
>> forward) the argument? Some people > are contributing their own experience
>> from building systems, but perhaps that's too anecdotal? Would a
>> structured survey be helpful? Or do you think we might be able to pick up
>> trends from the webdatacommons.org > (or similar) data?
> 
> A few things come to mind:
> 
> 1) a rigorous assessment of how difficult people *really* find it to
> understand distinctions such as "things vs documents about things".
> I've heard many people claim that they've failed to explain this (or
> similar) successfully to developers/adopters; my personal experience
> is that everyone gets it, it's no big deal (and IRs/NIRs would
> probably never enter into the discussion).

I think it's explainable. I don't think it's self evident

And explanation can be tricky because:

a) once you get past the obvious cases (a person and their homepage) there
are further levels of abstraction that make things complicated. A journalist
submits a report to a news agency, a sub-editor tweaks it and puts it on the
wires, a news publisher picks up the report, a journalist shapes an article
around it, another sub-editor tweaks that, the article gets published, the
article gets syndicated. Which document is the rdf making claims (created
by, created at) about? And is that the important / interesting thing? You
quickly head down a frbr shaped rabbit hole

b) The way people make and use websites (outside the whole linked data
thing) has moved on. Many people don't just publish pages; they publish
pages that have a one-to-one correspondence with "real world things". A page
per photo or programme or species or recipe or person. They're already in
the realm of thinking about things before pages and to them the page and
it's url is a good enough approximation for description

c) people using the web are already thinking about things not pages. If you
search google for Obama your mental model is of the person, not any
resulting pages

d) we already have the resource / representation split which is quite enough
abstraction for some people

e) the list of things you might want to say about a document is finite; the
list of things you might want to say about the world isn't
> 
> 2) hard data about the 303 redirect penalty, from a consumer and
> publisher side. Lots of claims get made about this but I've never seen
> hard evidence of the cost of this; it may be trivial, we don't know in
> any reliable way. I've been considering writing a paper on this for
> the ISWC2012 Experiments and Evaluation track, but am short on spare
> time. If anyone wants to join me please shout.

I know publishers whose platform is so constrained they can't even edit the
<head> section of their html documents. They certainly don't have access at
the server level

Even where 303s are technically possible they might not be politically
possible. Technically we could have easily created bbc.co.uk/things/:blah
and made it 303 but that would have involved setting up /things and that's a
*very* difficult conversation with management and ops

And if it's technically and politically possible it really depends on how
the 303 is set up. Lots of linked data people seem to conflate the 303 and
content negotiation. So I ask for something that can't be sent, they do the
accept header stuff and 303 me to the *representation* url. Rather than: I
ask for something that can't be sent, they 303 to a generic information
resource which content negotiates to the appropriate representation.

If you do this in two steps (303 then conneg) you can point any html links
at the generic document resource url so you don't pick up a 303 penalty for
every request

No sane publisher trying to handle a decent amount of traffic is gonna
follow the dbpedia pattern of doing it in one step (conneg to 303) and
picking up 2 server hits per request. I've said here before that the dbpedia
publishing pattern is an anti-pattern and shouldn't be encouraged

Whichever way you do it, it doesn't take away Dave Reynold's point that:

> I have been in discussions with clients where 303s are not acceptable
> (thanks to CDN behaviour).

Because the problem with CDNs is about varying on accepts not about the 303.
We have the same problem and we use # uris

Not sure how you fix that until someone makes a CDN that's actually
compliant with http but that's a different story


> 
> 3) hard data about occurrences of different patterns/anti-patterns; we
> need something more concrete/comprehensive than the list in the change
> proposal document.

Count dbpedia's conflation of 303 and conneg as an anti-pattern for me :-)
> 
> 4) examples of cases where the use of anti-patterns has actually
> caused real problems for people, and I don't mean problems in
> principle; have planes fallen out of the sky, has anyone died? Does it
> really matter from a consumption perspective? The answer to this is
> probably not, which may indicate a larger problem of non-adoption.

>From a consumption perspective I can't really comment. From a publisher
perspective things are not always clear and I'd hate to have to wait for a
plane crash / death before they became clear

In all of this I'm not sure whether the proposal helps that much. It still
involves minting 2 urls if you want to make some claims about the thing and
some claims about those claims. Which still means some technical / political
problems for some people. Not having a 303 helps with the technical stuff
for those who can't tinker at the server level. It still needs some
explaining to """stakeholders""" / management / ops tho which is, in my
experience, never an easy conversation

Personally I quite like the idea of only dealing with one uri, always
returning a 200, including descriptions of the "thing" and descriptions of
those descriptions in the same document and having a dedicated vocab
constrained to dealing with the claims about the claims which I *think* was
what Giovanni meant by:

> * Only return 200,
> * As a default, clients known that they're dealing with Non IR
> * if you really have to annotate some IR for very low lever purposes
> then you do it anyway with proper attributes/ontologies .. which
> clients will know and act accordingly.
> And we're back into reality, you're compatible with opengraph, schema.org,

Cheers
Michael

Ps when I first met the web I liked it because it was forgiving of
misunderstanding and incompetence. I've come to rely on this forgiveness :-)




> 
>> The larger question is how do we get to a state where we *don't* have this
>> permathread running, year in year
>> out. Jonathan and the TAG's aim with the call for change proposals is to get
>> us to that state. The idea is that by
>> getting people who think that the specs should say something different to
>> "put their money where their mouth is" > and express what that should be, we
>> have something more solid to work from than reams and reams of
>> opinionated emails.
> 
> This is a really worthy goal, and thank you to you, Jonathan and the
> TAG for taking it on. I long for the situation you describe where the
> permathread is 'permadead' :)
> 
>> But we do all need to work at it if we're going to come to a consensus. I
>> know everyone's tired of this discussion, > but I don't think the TAG is
>> going to do this exercise again, so this really is the time to contribute,
>> and preferably
>> in a constructive manner, recognising the larger aim.
> 
> I hear you. And you'll be pleased to know I commented on some aspects
> of the document (constructively I hope). If my previous email was
> anything but constructive, apologies - put it down to httpRange-14
> fatigue :)
> 
> Cheers,
> 
> Tom.


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Tuesday, 27 March 2012 15:17:49 UTC