Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14 from Jonathan A Rees on 2012-03-28 (public-lod@w3.org from March 2012)

From: Jonathan A Rees <rees@mumble.net>
Date: Wed, 28 Mar 2012 15:03:28 -0400
To: Norman Gray <norman@astro.gla.ac.uk>
Cc: Michael Brunnbauer <brunni@netestate.de>, Tim Berners-Lee <timbl@w3.org>, public-lod community <public-lod@w3.org>
Message-ID: <CAGnGFMLaE6n9Lxgekou3wHZEeT1YZdwsw2e+vQ4bUoWYybKRQw@mail.gmail.com>
On Wed, Mar 28, 2012 at 1:59 PM, Norman Gray <norman@astro.gla.ac.uk> wrote:
>
> Greetings.
>
> [This is a late response, because I dithered about sending it, because this whole thing seems simple enough that I've got to be missing stuff]
>
> On 2012 Mar 27, at 14:02, Jonathan A Rees wrote:
>
>> On Tue, Mar 27, 2012 at 7:52 AM, Michael Brunnbauer <brunni@netestate.de> wrote:
>>>
>>> Hello Tim,
>>>
>>> On Mon, Mar 26, 2012 at 04:59:42PM -0400, Tim Berners-Lee wrote:
>>>> 12) Still people say "well, to know whether I use 200 or 303 I need to know if this sucker is an IR or NIR" when instead they should be saying "Well, am I going to serve the content of this sucker or information about it?".
>>>
>>> I think the question should be "does the response contain the content of it"
>>> because I can serve both at once (<foaf:PersonalProfileDocument rdf:about="">).
>>
>> Yes, this is the question - is the retrieved representation content (I
>> used the word "instance" but it's not catching on), or description. It
>> can be both.
>
> Fine -- that seems the key question.  In some ideal world, everything on the web would come with RDF which explained what it was; but expecting that ever to happen would be mad.
>
> The HR14 resolution gives one answer to this, by doing _two_ things.
>
> Step 1. HR14 declares the existence of a subset of resources named 'IR'.  You can gloss this set as 'information resource', or 'document', note that the set is vague, or deny that the set is important, but that doesn't matter.
>
> Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a resource in IR:  If you get a 200 when you dereference X, the resource is conclusively in IR.  End of story.
>
> (you can all suck eggs, now, yes?)
>
> Why does the set IR matter? (and pace Tim and various weary voices in this metathread, I think it does matter).  Because saying 'X names a resource in IR' tells you that the URI and the associated resource have a Particularly Simple Relationship -- the content of the HTTP retrieval is the 'content' of the resource (in some way which probably doesn't have to be precise, but which asserts that resource is something, unlike a Macaw, that can come through a network).  In this way -- crucially -- it answers Tim's question (12) above: retrieving X with a 200 status obtains the content of the sucker.  So the concept of 'IR' does do some work because it gives the client information about the object.
>
> Right?

Wrong. Just knowing that it is an IR is not sufficient. You made a
logical leap, unjustified by anything written down anywhere, that it
was an IR *that had that content*. The Flickr and Jamendo examples are
perfectly consistent with the URI naming an IR, but the content you
get is not content of the IR described by the RDF therein, so they
name a different IR.

But let's grant this, as it can easily be fixed with a small
clarification, and move on. It does not really bear on your proposal
anyhow.

> BUT, we (obviously) also want to talk about things where there's a slightly more complicated relationship between the URI and some resource (eg a URI which names a bird).  In this case, the extra information (that the URI and the resource have a Particularly Simple Relationship) would be false.  The cost of a particularly simple step 2 above, is the (in retrospect variously costly) indirection of the 303-dance.
>
> So the whole discussion seems to be about whether and how to relax step 2.  Jeni Tennison's proposal says it should be relaxed in the presence of a 'describedby' link, David Booth's that it should be relaxed with a new definedby link, or a (self-)reference with rdfs:isDefinedBy.  My 'proposal' was that it could be relaxed even more minimally, by saying that placing the resource in IR (step 2 above) could be done by the client only if this didn't contradict any RDF in the content of the resource (because the RDF said that X named a person, say), however conveyed (and of course these two proposals achieve that).

You are asking the right question, and I applaud the effort. I think
many people would like a solution similar to this one. But IMO looking
for a contradiction is not actionable, and for me that's a recipe for
disaster, since it forces human judgment to intervene in each case.
Human judgment is both expensive and unreliable.

Contradictions are impossible to test by machine. The consistency of
statements such as dc:creator or rdfs:comment with what the content is
lies outside what machines can do. So you put humans in the path of
deciding whether there is a contradiction, and therefore what the URI
mode is. This doesn't sound good to me.

Second, we know OWL Full consistency (i.e. contradiction detection) is
undecidable, and OWL DL can be pretty hard. How did deciding the URI
mode come to depend on what logic is being used, and become so
complicated?

Third, the RDF could be accidentally consistent with what the content
is, when the intent was for the URI to refer to something that did not
have that content. Perhaps this intent could only be discerned by
checking some third source of information. You could just rule this a
mistake, but it seems fragile to me.

Fourth, the set of possible formats for RDF is open ended. Will those
agents desiring to make the distinction be required to parse
Manchester syntax and every new serialization that comes along? And
what makes RDF so special, anyhow?

> After all this torrent of message (and I have honestly tried to read a significant fraction of them, and associated documents), I'm still not seeing how this is problematic.  Perhaps I'm slow, or I've read the wrong fraction of messages.
>
>  * Anything that was HR14-compliant will still be compliant with the relaxed Step 2. No change.

This is not obvious - there could easily be URIs out there that are
supposed to refer to things that have the retrieved content, but where
the content happens to contain statements that are inconsistent with
this. I have often talked about a hypothetical "RDF hall of shame" web
site whose pages give bad examples of RDF. I would certainly not want
anyone to take the enclosed RDF seriously in determining what I meant
when I used its URI to refer to the bogus RDF.

But this is contrived, and I don't think I can come up with an
empirical counterexample that I did not construct myself, so I will
grant it. (On the other hand I'd like there to be *some* reliable way
to refer to the bogus RDF, even if it's not the URI.)

>  * Any resource that wasn't in IR before, but whose URI nonetheless produced 200, was formally broken. It was telling lies.  With a relaxed Step 2, it now won't be broken any more.  Some applications (Tabulator?) will have to change to respect that, but they couldn't tell they were being lied to before, so they're merely exchanging one problem for a fixable one.

The telling of lies would be a strange thing to end up in a technical
specification, but I won't object to this.

>  * This is insensitive to the definition of 'information resource', and it doesn't matter if the content is multiple things.  If a resource 200-says that its URI names a Book, then you don't have to worry whether that's an 'information resource' or not, because you know it's a book; end of algorithm; do not go to the end of Step 2; do not add any extra information hacked/derived from protocol details.

This is certainly a virtue, and I'm glad you see this, since so many
people are focusing on the information resource foolishness.

What I like to see along these lines is a simple, *actionable* rule
for telling the difference between content and description styles. The
rule that it's always content, the rule that it's always description,
Ian Davis's rule that it's content if you can't get an
application/rdf+xml response, and TimBL's rule that you check the HTTP
headers are all actionable. Checking for consistency, or leaving the
answer unspecified or up to the whim of local convention, are not.

> That seems an inexpensive change which un-breaks a lot of things.

Show me more details of an inexpensive algorithm that all agents will
mostly agree on, and let's talk again.

Best
Jonathan

> All the best (in some puzzlement),
>
> Norman
>
>
> --
> Norman Gray  :  http://nxg.me.uk
> SUPA School of Physics and Astronomy, University of Glasgow, UK
>
>
Received on Wednesday, 28 March 2012 19:03:57 UTC