Re: Change Proposal for HttpRange-14 from Michael Smethurst on 2012-04-04 (public-lod@w3.org from April 2012)

From: Michael Smethurst <michael.smethurst@bbc.co.uk>
Date: Wed, 04 Apr 2012 10:48:57 +0100
To: <tom.heath@talis.com>
CC: Jeni Tennison <jeni@jenitennison.com>, public-lod community <public-lod@w3.org>
Message-ID: <CBA1D799.303A6%michael.smethurst@bbc.co.uk>
On 30/03/2012 16:15, "Tom Heath" <tom.heath@talis.com> wrote:

> Hi Michael,
> 
> On 27 March 2012 16:17, Michael Smethurst <michael.smethurst@bbc.co.uk> wrote:
>> 
>> On 26/03/2012 17:13, "Tom Heath" <tom.heath@talis.com> wrote:
>> 
>>> Hi Jeni,
>>> 
>>> On 26 March 2012 16:47, Jeni Tennison <jeni@jenitennison.com> wrote:
>>>> Tom,
>>>> 
>>>> On 26 Mar 2012, at 16:05, Tom Heath wrote:
>>>>> On 23 March 2012 15:35, Steve Harris <steve.harris@garlik.com> wrote:
>>>>>> I'm sure many people are just deeply bored of this discussion.
>>>>> 
>>>>> No offense intended to Jeni and others who are working hard on this,
>>>>> but *amen*, with bells on!
>>>>> 
>>>>> One of the things that bothers me most about the many years worth of
>>>>> httpRange-14 discussions (and the implications that HR14 is
>>>>> partly/heavily/solely to blame for slowing adoption of Linked Data) is
>>>>> the almost complete lack of hard data being used to inform the
>>>>> discussions. For a community populated heavily with scientists I find
>>>>> that pretty tragic.
>> 
>> No data here I fear; merely anecdote. But anecdote is usually the best form
>> of data :-)
> 
> I guess this is where we'll have to differ :)
> 
> Of all people you guys at the BBC have great anecdotes, and clearly
> personally you have heaps of opinions about some of the big thorny
> issues in Linked Data deployment and usage, formed from first hand
> experience.
> 
> I'm not saying I agree or disagree with any of the specifics, I'm just
> making a plea for us to raise the level of analysis to a point where
> we have some more robust evidence from which to draw conclusions. I'll
> do what I can to contribute, but I think we all need to pitch in and
> produce this evidence if the discussion and conclusions are going to
> be credible. Anecdotes and opinion only get us so far.

Hi Tom

A late response before I return to lurking...

As lots of other people have pointed out I think there are 2 quite different
problems here:

- the performance problems (including non-cachability of 303s (as
implemented if not as specced) and cdns etc)

- the organisational / institutional / cultural problems. Basically
explaining and convincing enough people up the management chain to make it
happen. And, every 2 years, when everybody swaps seats, having a whole new
chain of people to convince...

(And using fragment identifiers buys you out of all that pain)

I can see how you'd get data to make a reasonable evaluation of the former.
(and as I said in the earlier email I think at least some of the performance
problems would be solved by separating out 303s from conneg, routing html
links to the generic document resource uri, not channelling every request
thru a 303 and only referring to the "thing that isn't a document" when you
want to make statements about it)

But I have no idea how you get data to help analyse the latter. In which
case you're left with anecdote...

===

And for some cases I admit I find it difficult to explain to myself. Most of
the example explanations start with physical things (people, cats,
buildings, trains, bridges etc) and the explanation is easy. For some set of
metaphysical things (organisations, football clubs (not teams / squads), tv
series, species...) it's also (relatively) easy. But for some set of stuff
(the definition of which I can't quite put my finger on), it's really not
that easy

Over recent days this list seems to have settled on something like: if you
can get a "reasonable" representation it's content; if you can't it's
description. For some definition of reasonable

Taking 2 uris from dbpedia:
http://dbpedia.org/resource/Fox_News_Channel is an organisation /
corporation / tv channel. It's easyish to argue you can't get a reasonable
response that isn't just a description

http://dbpedia.org/resource/Fox_News_Channel_controversies is (in wikipedia
terms) an overspill article. It could be a skos type concept I guess but
it's more of a compound concept (a sentence). No matter what http evolves
into, I can't think of a more reasonable response to that than a list of
controversies involving fox news. What's the 303 doing in that case?

It's made more confusing because the statements you get back from
Fox_News_Channel_controversies are more or less identical to the statements
you get back from Fox_News_Channel because the infoboxes on both wikipedia
pages are more or less the same. So dbpedia says fox news controversies is
an entity of type broadcaster and has a broadcastArea, a firstAirDate, a
headquarter, an owningCompany, a pictureFormat etc

Yours (in confusion)
Michael
 
> 
> Cheers,
> 
> Tom.
> 
> 
>>>> What hard data do you think would resolve (or if not resolve, at least move
>>>> forward) the argument? Some people > are contributing their own experience
>>>> from building systems, but perhaps that's too anecdotal? Would a
>>>> structured survey be helpful? Or do you think we might be able to pick up
>>>> trends from the webdatacommons.org > (or similar) data?
>>> 
>>> A few things come to mind:
>>> 
>>> 1) a rigorous assessment of how difficult people *really* find it to
>>> understand distinctions such as "things vs documents about things".
>>> I've heard many people claim that they've failed to explain this (or
>>> similar) successfully to developers/adopters; my personal experience
>>> is that everyone gets it, it's no big deal (and IRs/NIRs would
>>> probably never enter into the discussion).
>> 
>> I think it's explainable. I don't think it's self evident
>> 
>> And explanation can be tricky because:
>> 
>> a) once you get past the obvious cases (a person and their homepage) there
>> are further levels of abstraction that make things complicated. A journalist
>> submits a report to a news agency, a sub-editor tweaks it and puts it on the
>> wires, a news publisher picks up the report, a journalist shapes an article
>> around it, another sub-editor tweaks that, the article gets published, the
>> article gets syndicated. Which document is the rdf making claims (created
>> by, created at) about? And is that the important / interesting thing? You
>> quickly head down a frbr shaped rabbit hole
>> 
>> b) The way people make and use websites (outside the whole linked data
>> thing) has moved on. Many people don't just publish pages; they publish
>> pages that have a one-to-one correspondence with "real world things". A page
>> per photo or programme or species or recipe or person. They're already in
>> the realm of thinking about things before pages and to them the page and
>> it's url is a good enough approximation for description
>> 
>> c) people using the web are already thinking about things not pages. If you
>> search google for Obama your mental model is of the person, not any
>> resulting pages
>> 
>> d) we already have the resource / representation split which is quite enough
>> abstraction for some people
>> 
>> e) the list of things you might want to say about a document is finite; the
>> list of things you might want to say about the world isn't
>>> 
>>> 2) hard data about the 303 redirect penalty, from a consumer and
>>> publisher side. Lots of claims get made about this but I've never seen
>>> hard evidence of the cost of this; it may be trivial, we don't know in
>>> any reliable way. I've been considering writing a paper on this for
>>> the ISWC2012 Experiments and Evaluation track, but am short on spare
>>> time. If anyone wants to join me please shout.
>> 
>> I know publishers whose platform is so constrained they can't even edit the
>> <head> section of their html documents. They certainly don't have access at
>> the server level
>> 
>> Even where 303s are technically possible they might not be politically
>> possible. Technically we could have easily created bbc.co.uk/things/:blah
>> and made it 303 but that would have involved setting up /things and that's a
>> *very* difficult conversation with management and ops
>> 
>> And if it's technically and politically possible it really depends on how
>> the 303 is set up. Lots of linked data people seem to conflate the 303 and
>> content negotiation. So I ask for something that can't be sent, they do the
>> accept header stuff and 303 me to the *representation* url. Rather than: I
>> ask for something that can't be sent, they 303 to a generic information
>> resource which content negotiates to the appropriate representation.
>> 
>> If you do this in two steps (303 then conneg) you can point any html links
>> at the generic document resource url so you don't pick up a 303 penalty for
>> every request
>> 
>> No sane publisher trying to handle a decent amount of traffic is gonna
>> follow the dbpedia pattern of doing it in one step (conneg to 303) and
>> picking up 2 server hits per request. I've said here before that the dbpedia
>> publishing pattern is an anti-pattern and shouldn't be encouraged
>> 
>> Whichever way you do it, it doesn't take away Dave Reynold's point that:
>> 
>>> I have been in discussions with clients where 303s are not acceptable
>>> (thanks to CDN behaviour).
>> 
>> Because the problem with CDNs is about varying on accepts not about the 303.
>> We have the same problem and we use # uris
>> 
>> Not sure how you fix that until someone makes a CDN that's actually
>> compliant with http but that's a different story
>> 
>> 
>>> 
>>> 3) hard data about occurrences of different patterns/anti-patterns; we
>>> need something more concrete/comprehensive than the list in the change
>>> proposal document.
>> 
>> Count dbpedia's conflation of 303 and conneg as an anti-pattern for me :-)
>>> 
>>> 4) examples of cases where the use of anti-patterns has actually
>>> caused real problems for people, and I don't mean problems in
>>> principle; have planes fallen out of the sky, has anyone died? Does it
>>> really matter from a consumption perspective? The answer to this is
>>> probably not, which may indicate a larger problem of non-adoption.
>> 
>> From a consumption perspective I can't really comment. From a publisher
>> perspective things are not always clear and I'd hate to have to wait for a
>> plane crash / death before they became clear
>> 
>> In all of this I'm not sure whether the proposal helps that much. It still
>> involves minting 2 urls if you want to make some claims about the thing and
>> some claims about those claims. Which still means some technical / political
>> problems for some people. Not having a 303 helps with the technical stuff
>> for those who can't tinker at the server level. It still needs some
>> explaining to """stakeholders""" / management / ops tho which is, in my
>> experience, never an easy conversation
>> 
>> Personally I quite like the idea of only dealing with one uri, always
>> returning a 200, including descriptions of the "thing" and descriptions of
>> those descriptions in the same document and having a dedicated vocab
>> constrained to dealing with the claims about the claims which I *think* was
>> what Giovanni meant by:
>> 
>>> * Only return 200,
>>> * As a default, clients known that they're dealing with Non IR
>>> * if you really have to annotate some IR for very low lever purposes
>>> then you do it anyway with proper attributes/ontologies .. which
>>> clients will know and act accordingly.
>>> And we're back into reality, you're compatible with opengraph, schema.org,
>> 
>> Cheers
>> Michael
>> 
>> Ps when I first met the web I liked it because it was forgiving of
>> misunderstanding and incompetence. I've come to rely on this forgiveness :-)
>> 
>> 
>> 
>> 
>>> 
>>>> The larger question is how do we get to a state where we *don't* have this
>>>> permathread running, year in year
>>>> out. Jonathan and the TAG's aim with the call for change proposals is to
>>>> get
>>>> us to that state. The idea is that by
>>>> getting people who think that the specs should say something different to
>>>> "put their money where their mouth is" > and express what that should be,
>>>> we
>>>> have something more solid to work from than reams and reams of
>>>> opinionated emails.
>>> 
>>> This is a really worthy goal, and thank you to you, Jonathan and the
>>> TAG for taking it on. I long for the situation you describe where the
>>> permathread is 'permadead' :)
>>> 
>>>> But we do all need to work at it if we're going to come to a consensus. I
>>>> know everyone's tired of this discussion, > but I don't think the TAG is
>>>> going to do this exercise again, so this really is the time to contribute,
>>>> and preferably
>>>> in a constructive manner, recognising the larger aim.
>>> 
>>> I hear you. And you'll be pleased to know I commented on some aspects
>>> of the document (constructively I hope). If my previous email was
>>> anything but constructive, apologies - put it down to httpRange-14
>>> fatigue :)
>>> 
>>> Cheers,
>>> 
>>> Tom.
>> 
>> 
>> http://www.bbc.co.uk/
>> This e-mail (and any attachments) is confidential and may contain personal
>> views which are not the views of the BBC unless specifically stated.
>> If you have received it in error, please delete it from your system.
>> Do not use, copy or disclose the information in any way nor act in reliance
>> on it and notify the sender immediately.
>> Please note that the BBC monitors e-mails sent or received.
>> Further communication will signify your consent to this.
>> 
> 
> 


http://www.bbc.co.uk/
This e-mail (and any attachments) is confidential and may contain personal views which are not the views of the BBC unless specifically stated.
If you have received it in error, please delete it from your system.
Do not use, copy or disclose the information in any way nor act in reliance on it and notify the sender immediately.
Please note that the BBC monitors e-mails sent or received.
Further communication will signify your consent to this.
Received on Wednesday, 4 April 2012 09:49:35 UTC