Re: NIR SIDETRACK Re: Change Proposal for HttpRange-14

Jonathan, hello.

On 2012 Mar 28, at 20:03, Jonathan A Rees wrote:

> On Wed, Mar 28, 2012 at 1:59 PM, Norman Gray <norman@astro.gla.ac.uk> wrote:

>> The HR14 resolution gives one answer to this, by doing _two_ things.
>> 
>> Step 1. HR14 declares the existence of a subset of resources named 'IR'.  You can gloss this set as 'information resource', or 'document', note that the set is vague, or deny that the set is important, but that doesn't matter.
>> 
>> Step 2. HR14 gives a partial algorithm for deciding whether a URI X names a resource in IR:  If you get a 200 when you dereference X, the resource is conclusively in IR.  End of story.
[...]
>>   So the concept of 'IR' does do some work because it gives the client information about the object.
>> 
>> Right?
> 
> Wrong. Just knowing that it is an IR is not sufficient. You made a
> logical leap, unjustified by anything written down anywhere, that it
> was an IR *that had that content*. The Flickr and Jamendo examples are
> perfectly consistent with the URI naming an IR, but the content you
> get is not content of the IR described by the RDF therein, so they
> name a different IR.

But the content doesn't matter, according to HR14.  If I dereference Ian's macaw http://example.org/macaw and get back some random RDF that doesn't mention /macaw, or a picture of my cat, then (the resource identified by) /macaw is in IR, and we can say nothing about the resources mentioned in the RDF.

As far as I can tell, it's [1] which calls this set IR into existence, and the _only_ definition of it is a partial, operational, definition, again in [1], which says that a resource is in IR if (as opposed to iff) dereferencing a URI which identifies it happens with a 200 status.  Now, [1] (as you know) labels this not as 'IR', or 'foo', but as 'information resource', and this is alternatively glossed elsewhere [2] (again, as you obviously know) as 'document-like entity ("information resource")'.  This long name gestures towards the motivating intuition of the concept 'IR' but doesn't actually add anything (notoriously).

Thus as it stands, the term 'information resource' in [1] has no implication (beyond incidentally reiterating that the 200-retrieved content is a (REST) representation of the resource).

However, the point of introducing the term is, I've always taken it, that it licenses the client to jump to some conclusions.  These conclusions aren't spelled out anywhere, but (unless you're being whimsical) they're things like 'this is a document', or 'this is a network thing', or 'this is not a squawking macaw which will squeeze out of the ethernet port and crap on my keyboard'.  What those conclusions materialise as in practice surely _depends on the application_ which is processing the resource.

[1] http://lists.w3.org/Archives/Public/www-tag/2005Jun/0039.html
[2] http://www.w3.org/2001/tag/doc/uddp-20120229/

>> BUT, we (obviously) also want to talk about things where there's a slightly more complicated relationship between the URI and some resource (eg a URI which names a bird).  In this case, the extra information (that the URI and the resource have a Particularly Simple Relationship) would be false.  The cost of a particularly simple step 2 above, is the (in retrospect variously costly) indirection of the 303-dance.
>> 
>> So the whole discussion seems to be about whether and how to relax step 2.  Jeni Tennison's proposal says it should be relaxed in the presence of a 'describedby' link, David Booth's that it should be relaxed with a new definedby link, or a (self-)reference with rdfs:isDefinedBy.  My 'proposal' was that it could be relaxed even more minimally, by saying that placing the resource in IR (step 2 above) could be done by the client only if this didn't contradict any RDF in the content of the resource (because the RDF said that X named a person, say), however conveyed (and of course these two proposals achieve that).
> 
> You are asking the right question, and I applaud the effort. I think
> many people would like a solution similar to this one. But IMO looking
> for a contradiction is not actionable, and for me that's a recipe for
> disaster, since it forces human judgment to intervene in each case.
> Human judgment is both expensive and unreliable.

But it seems to me that 'this resource is in IR' isn't actionable either, since HR14 doesn't actually say what action corresponds to this.  So an application (always) has to _choose_ what this means (perhaps they add 'X a foaf:Document', for example, or put X in the graph name slot in a quad-store).  But HR14 does demand that, whatever 'resource is in IR' means locally, that meaning _must_ be applied to (the URI identifying) _any_ resource retrieved with a 200 status.

So what I'm suggesting is, I suppose, simply adding '...unless the application can think of a reason why not' to the end of part (a) of [1].

[ Operationally, that might mean using owl:DisjointClass and an OWL Full reasoner (expensive, as you said); or it might mean making foaf:Person rdfs:subClassOf eg:MyNIRClass, and then if it turns out after ingestion that 'X a eg:MyNIRClass' then forbearing to add 'a foaf:Document'; or using rules; or getting a little python script to scrub your data post-ingestion. ]

That's far too vague, of course.  From this point of view the current change proposals are 'simply' proposed-standard ways of indicating that a resource named by a particular URI is _not_ in IR (which is at present impossible), and that the application should _not_ therefore do whatever it normally does to resources in IR.  Another way might be to define a class std:NonIR which things like foaf:Person could be taken to be subclasses of.

So a less informal way of amending HR14 might be to permit a 200-retrieved resource to assert that it is not in IR, giving one or more blessed means of doing this, either as best practices or as stipulations (obviously including one or more of the existing proposals; fewer is doubtless better).

>>  * This is insensitive to the definition of 'information resource', and it doesn't matter if the content is multiple things.  If a resource 200-says that its URI names a Book, then you don't have to worry whether that's an 'information resource' or not, because you know it's a book; end of algorithm; do not go to the end of Step 2; do not add any extra information hacked/derived from protocol details.
> 
> This is certainly a virtue, and I'm glad you see this, since so many
> people are focusing on the information resource foolishness.
> 
> What I like to see along these lines is a simple, *actionable* rule
> for telling the difference between content and description styles. The
> rule that it's always content, the rule that it's always description,
> Ian Davis's rule that it's content if you can't get an
> application/rdf+xml response, and TimBL's rule that you check the HTTP
> headers are all actionable. Checking for consistency, or leaving the
> answer unspecified or up to the whim of local convention, are not.

'The whim of local convention' sounds bad.  But since HR14 doesn't say what an application is supposed to _do_ with things in IR, this was always down to 'local convention'.  Of course, a good application would avoid whimsy here, but this is the Wild Wild Web, and perhaps that's all we can ask.

So this, I suppose, is why I've styled this suggestion a 'proposal' rather than a proposal, since it's more a way of viewing the one thing (?) that the other proposals are all doing in different ways.  HR14 has seemed to have a rather 'constitutional' air to it; HR14-bis could retain that, and leave the standard or recommended mechanics of 'not in IR' to more quotidian legislative fiat.

Best wishes,

Norman


-- 
Norman Gray  :  http://nxg.me.uk
SUPA School of Physics and Astronomy, University of Glasgow, UK

Received on Thursday, 29 March 2012 00:38:38 UTC