Re: Example for consideration: Resource versus Representation from Jonathan Rees on 2008-02-04 (public-awwsw@w3.org from February 2008)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 4 Feb 2008 12:59:12 -0500
To: "Williams, Stuart (HP Labs, Bristol)" <skw@hp.com>
Cc: Pat Hayes <phayes@ihmc.us>, Alan Ruttenberg <alanruttenberg@gmail.com>, "public-awwsw@w3.org" <public-awwsw@w3.org>
Message-Id: <5ADDCDDE-8F09-43BD-87EB-8D5CF0B46953@creativecommons.org>
I appreciate your careful reply.

On Feb 4, 2008, at 8:24 AM, Williams, Stuart (HP Labs, Bristol) wrote:

>> -----Original Message-----
>> From: Jonathan Rees [mailto:jar@creativecommons.org]
>> Sent: 01 February 2008 23:53
>> To: Williams, Stuart (HP Labs, Bristol)
>> Cc: Pat Hayes; Alan Ruttenberg; public-awwsw@w3.org
>> Subject: Re: Example for consideration: Resource versus  
>> Representation
>>
>> On Jan 25, 2008, at 10:11 AM, Williams, Stuart (HP Labs,
>> Bristol) wrote:
>>>
>>>> Assigning a URI incurs a sort of moral obligation to resolve it
>>>> somehow, but lack of resolution doesn't make the assignment  
>>>> invalid.
>>>> (We all agree on this, right?)
>>>
>>> Yes, if we are speaking of http scheme URI.
>>>
>>> For URN's (ie. URN namespaces) it intentions are not clear. ...
>>
>> This is an interesting discussion, but let's put it off, if
>> that's OK, as it doesn't really belong to AWWSW, which is
>> only about HTTP.
>
> Hmmm.... I guess I'm willing to set that on the side for now.  
> However, in the long run I don't believe that AWWSW should be so  
> narrowly scoped - if the first WW is to have any meaning... plus  
> the core technologies place no restrictions on the 'spellings' of  
> URIs beyond that defined in "Uniform Resource Identifier (URI):  
> Generic Syntax" [1] and "Internationalized Resource Identifiers  
> (IRIs)" [2].
>
> [1] http://www.ietf.org/rfc/rfc3986.txt
> [2] http://www.ietf.org/rfc/rfc3987.txt

Let me tell you what I have in mind regarding deciding what to focus  
on. I vaguely remember that we have an informal 6-month charter from  
the TAG, although I can find no evidence for this. We've used 3 of  
the 6. In any case, I'm trying to start small, and to this end I am  
trying to push away or postpone as many potential issues as I can.

You are right that other URIs are of interest in the long run, and of  
course the HTTP Request-URI needn't be relative, so yes, the scope of  
AWWSW is not limited to http: URIs.

>> I'll make a note of it and maybe we can come back to it
>> later. When I spoke of "moral obligation" I was referring to
>> the AWWW "Available representation" principle [1], which
>> implicitly says you shouldn't use any URI scheme that lacks
>> resolution.
>>
>> Oddly, the principle is stated to apply to all resources,
>> while elsewhere it is intimated that only IRs have
>> representations. So there is no way to obey this principle
>> for non-IRs.
>
> Bear in mind that such ink as dried in AWWW, dried prior to the  
> TAG's resolution of httpRange-14 - ie. it was left open whether or  
> not representations could be retrieved for things other than IRs.

OK, so this is yet one more reason to revise AWWW. I guess I knew all  
this, that "available representation" really refers only to IRs...  
but given this I think the URI scheme is irrelevant, only the  
protocol matters, for now.

The task force is only charged with understanding HTTP semantics. The  
name "AWWSW" was chosen in haste and partially in jest, and is  
grander than what was intended.

>>>> In order to write meaningful RDF, you have to have subjects and
>>>> objects, and verbs (= predicates = properties). A fundamental
>>>> assumption - speak up now if you don't believe this - is that to be
>>>> clear and useful a property [a terrible word but we're stuck  
>>>> with it]
>>>> must have a specified domain and range -- classes to which the
>>>> subject and object must belong in order for statements using the
>>>> property to be acceptable in discourse.
>>>
>>> Hmmm...(speaking up) I think that we need to think about that. In  
>>> some
>>> of the communities in which I work the practice seems  
>>> increasingly to
>>> leave property domains as open as possible to encourage their re- 
>>> use.
>>
>> There are clearly many different ways to use RDF, and we may
>> have a cultural clash here, as until now, in my provincalism
>> I have not encountered anyone who argues against domain and
>> range assertions. I would love to hear experience with other
>> engineering approaches.
>> Again it's an interesting conversation that's a bit wide of
>> the AWWSW project, so I'll make a note of it. Let's try to
>> hedge the issue for now and if you think we're running into
>> trouble as a result please speak up.
>
> I guess I beg to differ in as much that *if* we are to have  
> something as grandly titled as an AWWSW, then it should cover the  
> common ground across a wide spectrum of use. Something more  
> narrowly scoped may indeed be an architecture... but it is less  
> clear to me that it could claim to the AWWSW.

That's fine. I'm sure I will learn something from hearing more about  
other sets of requirements. I've said that there are many important  
issues that are probably out of scope for AWWSW. I figure, though,  
that if I (or anyone) pursues a question here and gets any clarity at  
all, even if it's "form a new working group", then progress will have  
been made. Right now from where I stand most semantic web  
architecture issues look like a big muddle.

We could change the name of the group, if that would help us to  
remember to constrain the scope.

> <snip>shared OWL Composite key wish list</snip>
>
>>> Ok... though I think that there is a premise in that which is  
>>> perhaps
>>> again not universally held. Roughly, one accumulates
>>> statements/assertions about things of interest by retrieval  
>>> operations
>>> over the web. Individual representations may say quite contradictory
>>> things - eg. in the http://sw-app.org/mic.xhtml example that we
>>> considered on the call: IMO the contradiction is quite evident  
>>> from an
>>> understanding of the representation's media- type and it's content -
>>> rather than from any fine detail of the HTTP interaction - and their
>>> aggregation is quite another thing.
>>
>> Agreed.
>
> Ok... then I guess that we need a better concrete example to  
> explore in order to understand the kind of inferences that we would  
> like to be able to justify.

The first step, I think, is creating a sufficient vocabulary for  
*expressing* the kinds of assertions we'd like to be able to infer.  
This requires verbs, and I think we're starting to talk about this.  
Once we know how to say what we'd like to say, we can start talking  
about the circumstances under which those things *should* be said.

>> But there are other sources of statements than
>> representations. Agents make assertions about what they
>> observe or infer or conjecture all the time, then render
>> their wisdom as RDF that finds its way into HTTP responses.
>
> Ok... does that amount to some 'provenance' information that gives  
> an account of the derivation of some collection of RDF statement?

OK, sorry, ignore what I wrote. I was not talking about provenance,  
just content. I was probably misunderstanding what you wrote.

<snip/>

>> Who wrote this resource?
>
> Author, creator, owner, maintainer... I guess that there are a few  
> agents may have a relation with the resource that you would be  
> interested in. Representations may carry some self-describing  
> information wrt to some of those. I'm not aware of any HTTP headers  
> that would carry such information... that's not to say there aren't  
> or couldn't be any, just that at present I'm not aware of any.

There is no good way for cooperating agents to communicate this  
information; consider the case of a spreadsheet rendered as text/ 
plain. There's just no place in the representation to put any  
metadata. Sure, you can choose a different representation, but I  
wouldn't call that a "good" way to communicate. The issue of out-of- 
band metadata - in the GET response, or linked from the response, or  
as the response to a different request - has been discussed recently  
on semantic-web and/or www-tag, I think. I would say this is an  
architectural deficiency, and if it's not up to the TAG to fix it, it  
should be up to some other pro-HTTP group, as this deficiency (I  
believe) has been a factor in pushing many communities away from HTTP.

So this is not an AWWSW thing, and I didn't mean to say it was... but  
if AWWSW lays a good foundation then some other effort will be on a  
better footing.

>> How stable is the state of the resource - can
>> I depend on it remaining the same for a while?
>
> Cache-control headers may be helpful in that the can convey an  
> expiry date for cachable representations. I guess that you could  
> set them with seconds to years of stability - and obviously, you're  
> at least saying it's ok to use a representation up to its expiry  
> date, i don't think you're necessarily committing that the  
> available representation(s) or the resource state is invariant over  
> that period - though that might be a reasonable claim. I read HTTP  
> caching as trading speed of response for currency of  
> representation. Etags are probably of interest as well.

While technically this is correct I'm not sure it's practical (how  
often are publishers in a position to control the cache-control  
headers?) or that it carries the correct intent (it talks about  
server behavior, not the nature of the resource - if we can say these  
coincide then we've made progress, but I know of nothing so far that  
would imply that they do). But I'd be happy to explore cache control  
headers as one way to communicate this kind of information.

>> To refer to what I see now, can I link to this URI or do I have to  
>> copy the content?
>
> IMO... in general it is not possible to link to "what I see now".  
> The link is a reference to a resource not it's representation. I  
> would be possible to create resources whose sole purpose is to  
> provide an enduring snapshot of a related resource. eg. documents  
> on the W3C TR page use a convention that achieve something like  
> this - but that is a site specific convention.

Sorry, let me rephrase:
1. Will the representation I retrieve now also be a representation of  
the resource tomorrow (even if it's not a representation the server  
still serves)?  (If we can say why that's an ill-formed  
parenthetical, we will have made progress.)
2. Will the representation I retrieve now also be the representation  
that the same request will retrieve tomorrow?
Depending on the application, a "no" answer to one or the other might  
mean that the application will want to save a copy (instead of just  
saving a link).

Site specific conventions are wonderful, and W3C's are a valuable  
example. How can they be communicated so that automated clients can  
exploit them?

>> What are the available representations?
>
> I don't know of any way of reliably determining that. There are  
> probably some heuristics that may work in certain circumstances -  
> but I suspect it would rely on repeated trail and error varing  
> acceptable content types in a requests ACCEPT header (if present).

Suppose a server wanted to communicate the answer. Wouldn't it be  
wonderful if that could be done using RDF?

>> If an archival copy exists, where is it?
>
> I don't think you could determine that from HTTP headers. It may be  
> regarded as self-descriptive information in some representations in  
> the sense of declaring a relation with some other resource.

Again, I agree that we don't now have protocols that help with this.  
Wouldn't it be nice if we did - at the very least, a vocabulary that  
allowed us to talk about properties of servers? Maybe an AWWSW  
vocabulary would form some subset of such a vocabulary.

Well, really at the very least would be a standard place to put  
information like this, even if we didn't standardize on the vocabulary.

> Some of the work going on in the library and bibliographic  
> communities is probably relvant - though may stray into the domain  
> of info: doi: and URNs more generally.

Wouldn't it be nice if the library community could layer their  
resources on top of the web, instead of going off and building a pile  
of incompatible formats, languages, naming schemes, and protocols?  
The goals of the two communities are very similar. If we want to say  
http: is broadly applicable, can't we make a case that it's good  
enough for libraries? info: and DOIs are a failure of web  
architecture (not sure whether technical or marketing), I think, but  
it may not be too late to repair this failure.

>> And for non-IRs:
>> Where can I find descriptions of the thing?
>
> Well, for # URIs, the first port of call is at least straight forward.
> For non-# URI, then the TAG's 303 advice provides a roughly  
> equivalent mechanism.

> The expectation is that folks deploying URI for such non-IRs will  
> *want* you to be able to find out about them (ie. find some form of  
> description) and it is in their best interest to deploy something  
> useful by either of these means. Of course, as things stand, there  
> are no guarantees with either approach that a retrieved  
> representation will in fact have anything to say about the resource  
> you were initially interested in.
Exactly. I think that if we articulated the conditions under which  
the follow-your-nose heuristics are not heuristics - even if only to  
give a name or phrase to such conditions - that would be of great value.

>> How is the URI intended to be used?
>
> Is that the same as "What the URI is intended to denote?" eg. that  
> a URI denotes an rdf:Property 'probably' indicates that the URI is  
> intended to be used in the 'predicate' position of (most) RDF  
> triples in which it occures.

Well, I tend to say "how x is used" instead of "what x denotes" in  
order to admit more use cases for RDF (sorry, I'm poking fun, please  
don't be offended) and to talk about aspects of use other than  
denotation, such as expiration date or examples... but that doesn't  
matter, assume what I mean is "what x denotes".
I think this is related to the question of stability. One might like  
to use a URI in a persistent context - e.g. repeat something one has  
learned about the referent in an hour or a month. Some of the  
statements you learn about it may be true in a month, while others  
may not be. E.g. if now we know that U rdf:type Thermometer and U  
foo:has-temperature-Celsius "22", will the URI U still denote a  
thermometer one hour from now? Obviously it won't read the same  
temperature - but how did we know we weren't supposed to cache the  
temperature (cache control maybe)? If there were a notion of  
distinguishing definition from use, both of which currently occur  
inside the same descriptive document, we'd be in better shape.

I'm referring here to the issue David Booth has raised in the form of  
"URI declarations": what is so true of the thing that if it weren't  
you'd have a different thing, as opposed to accidentally true, so  
that if it weren't you'd think you'd made an error of fact?

I don't expect AWWSW or the TAG to solve this, but right now this is  
a hopelessly confused subject. Local solutions are easy, but the  
semantic web isn't supposed to be local, so I think some standards  
body ought to take up these issues.

Best

Jonathan
Received on Monday, 4 February 2008 17:59:36 UTC