Re: Change Proposal for HttpRange-14 from Melvin Carvalho on 2012-03-27 (public-lod@w3.org from March 2012)

From: Melvin Carvalho <melvincarvalho@gmail.com>
Date: Tue, 27 Mar 2012 21:23:21 +0200
To: Jeni Tennison <jeni@jenitennison.com>
Cc: tom.heath@talis.com, public-lod community <public-lod@w3.org>
Message-ID: <CAKaEYh+8KwswFHZMhUbo0uyh9+_x1DDsLHsJyGxTUW38txnufQ@mail.gmail.com>
On 27 March 2012 19:54, Jeni Tennison <jeni@jenitennison.com> wrote:

> Hi Tom,
>
> On 26 Mar 2012, at 17:13, Tom Heath wrote:
> > On 26 March 2012 16:47, Jeni Tennison <jeni@jenitennison.com> wrote:
> >> Tom,
> >>
> >> On 26 Mar 2012, at 16:05, Tom Heath wrote:
> >>> On 23 March 2012 15:35, Steve Harris <steve.harris@garlik.com> wrote:
> >>>> I'm sure many people are just deeply bored of this discussion.
> >>>
> >>> No offense intended to Jeni and others who are working hard on this,
> >>> but *amen*, with bells on!
> >>>
> >>> One of the things that bothers me most about the many years worth of
> >>> httpRange-14 discussions (and the implications that HR14 is
> >>> partly/heavily/solely to blame for slowing adoption of Linked Data) is
> >>> the almost complete lack of hard data being used to inform the
> >>> discussions. For a community populated heavily with scientists I find
> >>> that pretty tragic.
> >>
> >>
> >> What hard data do you think would resolve (or if not resolve, at least
> move forward) the argument? Some people > are contributing their own
> experience from building systems, but perhaps that's too anecdotal? Would a
> >> structured survey be helpful? Or do you think we might be able to pick
> up trends from the webdatacommons.org > (or similar) data?
> >
> > A few things come to mind:
> >
> > 1) a rigorous assessment of how difficult people *really* find it to
> > understand distinctions such as "things vs documents about things".
> > I've heard many people claim that they've failed to explain this (or
> > similar) successfully to developers/adopters; my personal experience
> > is that everyone gets it, it's no big deal (and IRs/NIRs would
> > probably never enter into the discussion).
>
> How would we assess that though? My experience is in some way similar --
> it's easy enough to explain that you can't get a Road or a Person when you
> ask for them on the web -- but when you move on to then explaining how that
> means you need two URIs for most of the things that you really want to talk
> about, and exactly how you have to support those URIs, it starts getting
> much harder.
>

I'm curious as to why this is difficult to explain.  Especially since I
also have difficulties explaining the benefits of linked data.  However,
normally the road block I hit is explaining why URIs are important.

Are there perhaps similar paradigms that the majority of developers are
already already familiar with?


One that springs to mind is in java

You have a file "Hello.java"

But the file contains the actual class, "Hello", which has keys and
values.


Or perhaps most people these datys know JSON, where you have file like
"hello.json"

The file itself is not that important, but it can contain 0 or more
objects, such as
{
  key1 : value1,
  key2 : value2,
  key3 : value3
}

Would this be a valid analogy?


>
> The biggest indication to me that explaining the distinction is a problem
> is that neither OGP nor schema.org even attempts to go near it when
> explaining to people how to add to semantic information into their web
> pages. The URIs that you use in the 'url' properties of those vocabularies
> are explained in terms of 'canonical URLs' for the thing that is being
> talked about. These are the kinds of graphs that millions of developers are
> building on, and those developers do not consider themselves linked data
> adopters and will not be going to linked data experts for training.
>
> > 2) hard data about the 303 redirect penalty, from a consumer and
> > publisher side. Lots of claims get made about this but I've never seen
> > hard evidence of the cost of this; it may be trivial, we don't know in
> > any reliable way. I've been considering writing a paper on this for
> > the ISWC2012 Experiments and Evaluation track, but am short on spare
> > time. If anyone wants to join me please shout.
>
> I could offer you a data point from legislation.gov.uk if you like. When
> someone requests the ToC for an item of legislation, they will usually hit
> our CDN and the result will come back extremely quickly. I just tried:
>
> curl --trace-time -v http://www.legislation.gov.uk/ukpga/1985/67/contents
>
> and it showed the result coming back in 59ms.
>
> When someone uses the identifier URI for the abstract concept of an item
> of legislation, there's no caching so the request goes right back to the
> server. I just tried:
>
> curl --trace-time -v http://www.legislation.gov.uk/id/ukpga/1985/67
>
> and it showed the result coming back in 838ms, of course the redirection
> goes to the ToC above, so in total it takes around 900ms to get back the
> data.
>
> So every time that we refer to an item of legislation through its generic
> identifier rather than a direct link to its ToC we are making the site seem
> about 15 times slower. What's more, it puts load on our servers which
> doesn't happen when the data is cached; the more load, the slower the
> responses to other important things that are hard to cache, such as
> free-text searching.
>
> The consequence of course is that for practical reasons we design the site
> not to use generic identifiers for items of legislation unless we really
> can't avoid it and add redirections where we should technically be using
> 404s. The impracticality of 303s has meant that we've had to compromise in
> other areas of the structure of the site.
>
> This is just one data point of course, and it's possible that if we'd
> fudged the handling of the generic identifiers (eg by not worrying about
> when they should return 404s or 300s and just always doing a regex mapping
> to a guess of an equivalent document URI) we would have better performance
> from them, but that would also have been a design compromise forced on us
> because of the impracticality of 303s. (In fact we made this precise design
> compromise for the data.gov.uk linked data.)
>
> > 3) hard data about occurrences of different patterns/anti-patterns; we
> > need something more concrete/comprehensive than the list in the change
> > proposal document.
>
> Yes, it would be good for someone to spend a long time on the entire
> webdatacommons.org corpus in a rigorous rather than a couple of hours in
> an evening testing URIs based on sifting through a couple of the files by
> eye.
>
> > 4) examples of cases where the use of anti-patterns has actually
> > caused real problems for people, and I don't mean problems in
> > principle; have planes fallen out of the sky, has anyone died? Does it
> > really matter from a consumption perspective? The answer to this is
> > probably not, which may indicate a larger problem of non-adoption.
>
> I don't personally have any examples of this; that doesn't mean they don't
> exist.
>
>
> Anyway, back to process. The TAG could try to pull the larger community
> into contributing evidence around these issues. There's already the AWWWSW
> Wiki at
>
>  http://www.w3.org/wiki/AwwswHome
>
> which gathers together lots and lots and lots of writing on the topic, a
> portion of which is based on experience and studies of existing sites,
> which we could structure around the questions above and add to.
>
> Having thought about it though, I'm not sure whether any of these would
> actually succeed in furthering the argument, particularly as much of the
> data is likely to equivocal and not lead to a "given the evidence, we
> should do X" realisation.
>
> Taking the third one above for example, studying webdatacommons.org, I
> think people who back the status quo will point at the large amount of data
> that is being produced through blogging frameworks and the like and claim
> that this shows that publishers are generally getting it right. People who
> want to see a change will argue that the data that's there could have been
> published much more easily, and wonder how much more there would be, if
> httpRange-14 weren't so hard to comprehend.
>
> Basically, I fear that we're just likely to end up arguing over the
> interpretation of any evidence we collect which would leave us no further
> on. I don't mean to nix the idea, I'm just a little pessimistic.
>
> Cheers,
>
> Jeni
> --
> Jeni Tennison
> http://www.jenitennison.com
>
>
>
Received on Tuesday, 27 March 2012 19:23:50 UTC