Re: Example for consideration: Resource versus Representation from Jonathan Rees on 2008-02-01 (public-awwsw@w3.org from February 2008)

From: Jonathan Rees <jar@creativecommons.org>
Date: Fri, 1 Feb 2008 18:52:43 -0500
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: Pat Hayes <phayes@ihmc.us>, Alan Ruttenberg <alanruttenberg@gmail.com>, "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-Id: <1E1524AF-B56E-4288-A004-6B8DC8332156@creativecommons.org>
On Jan 25, 2008, at 10:11 AM, Williams, Stuart (HP Labs, Bristol) wrote:
>
>> Assigning a URI incurs a sort of moral obligation to resolve
>> it somehow, but lack of resolution doesn't make the
>> assignment invalid. (We all agree on this, right?)
>
> Yes, if we are speaking of http scheme URI.
>
> For URN's (ie. URN namespaces) it intentions are not clear. ...

This is an interesting discussion, but let's put it off, if that's  
OK, as it doesn't really belong to AWWSW, which is only about HTTP.  
I'll make a note of it and maybe we can come back to it later. When I  
spoke of "moral obligation" I was referring to the AWWW "Available  
representation" principle [1], which implicitly says you shouldn't  
use any URI scheme that lacks resolution.

Oddly, the principle is stated to apply to all resources, while  
elsewhere it is intimated that only IRs have representations. So  
there is no way to obey this principle for non-IRs.

>> In order to write meaningful RDF, you have to have subjects
>> and objects, and verbs (= predicates = properties). A
>> fundamental assumption - speak up now if you don't believe
>> this - is that to be clear and useful a property [a terrible
>> word but we're stuck with it] must have a specified domain
>> and range -- classes to which the subject and object must
>> belong in order for statements using the property to be
>> acceptable in discourse.
>
> Hmmm...(speaking up) I think that we need to think about that. In  
> some of the communities in which I work the practice seems  
> increasingly to leave property domains as open as possible to  
> encourage their re-use.

There are clearly many different ways to use RDF, and we may have a  
cultural clash here, as until now, in my provincalism I have not  
encountered anyone who argues against domain and range assertions. I  
would love to hear experience with other engineering approaches.  
Again it's an interesting conversation that's a bit wide of the AWWSW  
project, so I'll make a note of it. Let's try to hedge the issue for  
now and if you think we're running into trouble as a result please  
speak up.

> Ok... though wrt to "inverse functional" what I think I'd really  
> like is to be able to catalog distinguishing characteristic ie. the  
> value of a single property is not neccessarily of itself a  
> distinguishing characteristic - but the combined values of some  
> collection of properties may be distinguishing between individuals  
> (composite keys).

I think this comes up in OWL discussions from time to time, and I  
sure wish that composite keys were in the logic. For now constraints  
like this (if x,y belong to class C and x!p = y!p and x!q = y!q then  
x=y) will have to be expressed either in prose or in assertions whose  
intent is not understood by the reasoner.

> Ok... though I think that there is a premise in that which is  
> perhaps again not universally held. Roughly, one accumulates  
> statements/assertions about things of interest by retrieval  
> operations over the web. Individual representations may say quite  
> contradictory things - eg. in the http://sw-app.org/mic.xhtml  
> example that we considered on the call: IMO the contradiction is  
> quite evident from an understanding of the representation's media- 
> type and it's content - rather than from any fine detail of the  
> HTTP interaction - and their aggregation is quite another thing.

Agreed. But there are other sources of statements than  
representations. Agents make assertions about what they observe or  
infer or conjecture all the time, then render their wisdom as RDF  
that finds its way into HTTP responses. Tabulator, for example,  
observes the HTTP interactions that it initiates, records the basic  
facts of the matter as RDF, make a few simple inferences, and renders  
what it knows as RDF. This may be odd, but it's not extraordinary and  
in fact is exactly the kind of thing RDF is for.

> It seems evident to me that Pat, in messages such as [4] (that one  
> in particular I greatly appreciate) and other related messages,  
> urges us to make a few inferences as possible - perhaps ideally,  
> well none - from what I might call the 'fine detail' of an http  
> interaction.

The RDF constructed by Tabulator is mostly simple observation, not  
inference, so I can agree with you. Saying that the entity  
"represents" the resource (should we choose to do so) would be a much  
bigger step, and we need to drill down on this deeper kind of  
relationship. See below.

> In part I think that Pat's advocacy has been roughly along the  
> lines of admit only triples that you get from representations; and  
> even then - you want to be careful about the statements you accept  
> in good faith: malicious behaviour may seek to induce  
> contradiction; and even with the best of intentions mistakes can  
> also lead to contradiction and non-sense. That raises a whole raft  
> of trust/provenance issues that I would have thought far more  
> significant than what can be inferred from the 'fine detail' of an  
> http interaction. Eg. in David's analysis [5] that we reviewed on  
> the call for example, there is a step that blindly 'asserts' things  
> that are found to be logical formula - which may be fine in a  
> relatively closed and high-trust environment. I suspect more  
> 'caution' is required on the wild, open, semantic web. How one (or  
> one's agent) decides to trust a set of statements in a  
> representation I think is a big issue and somewhat dwarfs what we  
> can infer from a 200, 302 or a 303.

Absolutely - RDF-harvesting agents have to have a degree of suspicion  
compatible with the application to which they're being put. I take  
this as a given. But this doesn't mean we shouldn't think about what  
can be inferred *given* a certain trust policy. Tabulator is quite  
credulous because it's a kind of debugger or browser and the  
consequences of being wrong are not a lot worse than the consequences  
of the browser rendering wikipedia entries that are wrong.  
Neurocommons on the other hand is quite skeptical because it is  
SPARQL- and OWL-heavy and doesn't want to give scientists wrong  
answers. It only loads hand-picked sources.

> All that said, I may also have misunderstood Pat's advocacy - but I  
> think that it's close to, infer nothing from the response codes.

Certainly, conservatively one should infer nothing.  But conservatism  
would be a choice. If I can pretend to be Tim, the point is not to  
conservatively say "we can't trust it so we can't infer anything" but  
rather to say "what should the architecture be so that among agents  
adhering to it we start to have interesting conversations".

> In part that's why I have been seeking to have the inferences  
> driven from the other end - what inferences do you want to be able  
> to justify - which ought to lead us to 'proof-steps' that depend on  
> inferences that can only be made on the basis of interaction detail  
> - or pehaps not - maybe we will find that the content of  
> representations is in general sufficient.

I'll just mention information that I (on behalf of Science Commons)  
care about, since it's difficult for me to get past it right now.  
Most of the it is beyond anything represented in the HTTP interaction  
and therefore probably beyond what the AWWSW task can do, but I'll  
state it as a sort of pie-in-sky. Who wrote this resource? How stable  
is the state of the resource - can I depend on it remaining the same  
for a while? To refer to what I see now, can I link to this URI or do  
I have to copy the content? What are the available representations?  
If an archival copy exists, where is it? Is a mirror, or copy, that I  
create as good as the resource I'm trying to mirror? And for non-IRs:  
Where can I find descriptions of the thing? How is the URI intended  
to be used? Is the description accidental and time- or hypothesis- 
bound, or is it essential to what it means for the URI to denote what  
it does (i.e. for one to be playing its language-game)?

The ability to communicate this kind of information is as important  
as the ability to discover it.

I'm hoping that we can lay a rudimentary basis for further work on  
issues like these.

Best
Jonathan

[1] http://www.w3.org/TR/webarch/Overview.html#representation-management
Received on Friday, 1 February 2008 23:53:13 UTC