Re: [pedantic-web] Re: The OWL Ontology URI from Pat Hayes on 2010-05-14 (public-awwsw@w3.org from May 2010)

From: Pat Hayes <phayes@ihmc.us>
Date: Fri, 14 May 2010 16:17:32 -0500
To: Jonathan Rees <jar@creativecommons.org>
Cc: Dan Brickley <danbri@danbri.org>, AWWSW TF <public-awwsw@w3.org>
Message-Id: <39B3686E-25EB-4231-BD9C-9E7BC3D4A369@ihmc.us>
On May 14, 2010, at 8:31 AM, Jonathan Rees wrote:

> Thanks for contributing this example, Dan.
>
> I got sucked into this problem in 2006 when I needed URIs for things
> like journal articles and images, as opposed to web pages that happen
> to carry journal articles or web resources from which I could obtain
> encodings of an image. I wanted to know whether a non-# 2xx http: URI
> could/should refer to a journal article or digital image. I still
> don't have an answer.
>
> Pat says no - using a 2xx URI to refer to an article or image
> independent of encoding violates his interpretation of the
> httpRange-14 rule.

Well, not exactly. My view of http-range-14 is that the 200 code  
response indicates that the URI denotes whatever it "identifies". If  
that is a generic resource, then fine (insofar that one can make clear  
exactly what such a thing is, but I'm happy with it being the  
abstraction that stands immediately behind some content negotiation.)  
But my argument was that something like an RDF/XML encoding of an RDF  
graph is a fully-fledged information resource in its own right, not  
merely one component of a content-negotiable bundle which can be  
conceptualized as a generic resource, like DanB's example of the  
various image file formats. And if so, then *it* is what is denoted by  
a URI that returns it.

Now, I guess that one could have the very same 'thing' be both a  
resource in its own right and also be part of a content-negotiated  
bundle, with a different URI which accesses the content negotiation  
process. I have a worry with this, however, as it seems to drive a  
truck through the purpose of httpRange14, since it means there is then  
no way, given a URI which yields a 200 level response, to figure out  
what it denotes. Which I thought was the whole point.  (Maybe my  
interpretation of that ruling is too narrow. Though if so, I really  
cannot see what the point of the ruling is.) So if someone urges this  
idea, let me ask them: what specifies that one URI A denotes the  
concrete resource (eg the RDF/XML document) and that the other URI B  
denotes a more abstract entity which can provide different  
representations based on content negotiation, when both of these URIs  
send back identical 200-coded responses to a GET? How is the 'naming'  
done?

However, quite independently of content negotiation, I don't think  
that RDF graphs (or indeed literary works) *are* generic resources in  
the same way that a bundle of image files is. Content negotiation is  
never going to get me a first edition of Moby Dick, or a rendering of  
an RDF graph as a drawing on a piece of paper. Going all the way up to  
that level of abstraction seems like too much of a reach for content  
negotiation.


> TimBL says yes (see "generic resources" memo) as
> do others (e.g. Roy, AWWW, DanB here). My instinct is that if we don't
> say yes we'll raise the semweb barrier to entry still further, and
> discourage the creation of metadata (the latter being my current
> crusade), so I am inclined to agree with Tim.

I fail to see why this would be the case. It would mean only that  
things that can be retrieved by HTTP will be digital entities rather  
than abstractions; but we knew that already. There is no problem with  
having a URI denote the work Moby Dick. Its just that you can't  
retrieve that work directly using HTTP. (The URI might do a 303  
redirect to something that provides you with a content-negotiated  
version of the text, for example.)  But if you think about it for a  
second, surely that too is obvious. You never could pick up THE WORK,  
even in a very good bookshop.

>
> Alan R says that generic resources such as articles and images *risk*
> violating the rule, with its latent fear and uncertainty, and that's
> enough to force a # URI (or non-2xx). Pat and Alan have different
> reasoning but the same conclusion: interpret 2xx URIs only as
> resources that are isoontic to some single REST-representation.
>
> FWIW Ian Hickson also rejects the idea of generic resource, and I
> think Larry Masinter does too (or I infer this from comments he's made
> during discussions of metadata; but his position may be more nuanced).
>
> I think that for semweb purposes it's good practice for things to be
> defined and distinguished by their properties and class memberships
> whenever possible. In the case of the article or image the
> REST-representation has no "interesting" properties (author, date,
> location, subject matter, etc) that the article or image doesn't also
> have, so IMO there is little point in assuming they're distinct.

? That seems quite obviously false. For a start, they have a file type  
and an encoding. These things are the very stuff of digital processing.

> (I
> think I'm repeating Dan C?) The article is obtained by "subtracting"
> uninteresting properties from one of its REST-representations. It's
> only if you insist on transferring *all* properties (such as length,
> media type, resolution, encoding) from a REST-representation to its
> resource that you are forced to distinguish the article from the
> resource (and you would then have to give a # or non-2xx URI to the
> article).

Right now, one can own the copyright of an article but a journal  
publisher can own the copyright to the page image of that article  
which was printed in their journal. So it is important to be able to  
clearly distinguish properties which change when a page image is  
altered; and the version which preserves those properties is certainly  
a resource in its own right. Large amounts of money might depend on  
keeping that straight.

>
> But if "too many" properties of the REST-representation hold of the
> resource, you will get inconsistencies, since as Dan B spells out RFC
> 2616 very clearly permits, and many web sites assert, simultaneous
> distinct REST-representations (i.e. having inconsistent properties)
> for the same resource.

Just an aside, but who says this is the same resource? Does anything  
in the pre-semantic Web depend on there being a single resource in  
cases like this? If someone were to insist that there were, in fact,  
several resources here, one corresponding to each REST representation,  
and that  the URI was ambiguous between them; what would change? This  
would be consistent with RFC 2616 (which singularly fails to provide  
even the beginning of identity criteria for 'resources'.)

> You cannot have a resource whose octet-length
> is both 2273 and 31177, as length is functional, but according to 2616

> you can have a resource with a REST-representation of length 2273 and
> another REST-representation of length 31177.
> (The resource itself
> would not have a length at all.)
>
> I think Pat says, when attempting to formulate a satisfying
> interpretation, turn a blind eye toward all but one of the resource's
> REST-representations.

No, Im cool with abstractions like the one you cite, provided one is  
willing to accept that they render their concretizations 'invisible'  
to reference. But Im not so sanguine as you appear to be about that. I  
think that it is often the most concrete instance which is of most  
interest to applications.

> I'm not keen on the solution because it would
> prevent reasoning (or enable incorrect reasoning) about resources that
> have multiple REST-representations - what the REST-representations
> are, how they compare, how they change over time, when they are
> emitted (by the server), and so on.
>
> This idea of property (or metadata) transfer from REST-representation
> to resource (i.e. inference rules of the form P(REST-rep) ->
> P(resource)) has been the most useful heuristic I've come up with so
> far for explaining the REST/AWWW architecture to myself. It's very
> similar to the way FRBR works. We might have Item = portion of
> physical medium, Manifestation = REST-representation, Expression /
> Work = particular kinds of static "information resources" (not an
> exhaustive covering of IRs or GRs). A manifestation and a work can
> share some of their properties but won't share all of them.
>
> If "generic resources" (those related to more than one
> REST-representation) really are useless as metadata subjects I guess
> we could reject 2616/REST/AWWW, but that would be a chore.
>
> Jonathan
>
> On Fri, May 14, 2010 at 3:16 AM, Dan Brickley <danbri@danbri.org>  
> wrote:
>> On Mon, May 10, 2010 at 10:25 PM, Pat Hayes <phayes@ihmc.us> wrote:
>>> Let me give an intuitive case in support of the Nays here. An RDF  
>>> graph is a
>>> set, which is not the same as a document, for sure. The *same*  
>>> graph can be
>>> encoded in a variety of different syntactic forms.
>>
>> Try re-running this to show that digital images aren't information  
>> resources?
>>
>> 'The *same* digital image can be encoded in a variety of different
>> syntactic forms (PNG, GIF, JPEG, SVG, ...).
>>
>> (Or even that books aren't information resources ('the same book can
>> be encoded in a variety of different forms; both digital
>> (text,html,pdf and various arrangements of atoms).)
>>
>>> Consider two documents,
>>> one in RDF/XML, the other in NTriples, describing the same graph.  
>>> If we
>>> identify the document with the graph it describes, then these have  
>>> to be the
>>> same. But they aren't the same. So even if a graph is an information
>>> resource (and I agree that one can make out a case for that  
>>> position), it
>>> certainly isn't the same information resource as any document (In  
>>> RDF/XML or
>>> NTriples or any other notation) that represents it syntactically.  
>>> So, one
>>> ought to use redirection to refer to it, according to http- 
>>> range-14. So,
>>> whether its an information resource or not is kind of moot, since  
>>> even if it
>>> is, it can't be directly identified by a URI which returns a 200  
>>> code.
>>
>> http://www.w3.org/Icons/w3c_home.png
>> http://www.w3.org/Icons/w3c_home.gif
>> http://www.w3.org/Icons/w3c_home
>>
>> One generic digital resource, two specific 'bags of bits' that encode
>> it, three URIs and here's the gory details from talking to W3C's
>> server:
>>
>> First we make three requests to the server; first time not expressing
>> a preference towards any type of bag of bits, then we say we prefer
>> gif, then we say we prefer png:
>>
>> curl --dump-header h1 -o f1  http://www.w3.org/Icons/w3c_home
>> curl -H 'Accept: image/gif' --dump-header h2 -o f2
>> http://www.w3.org/Icons/w3c_home
>> curl -H 'Accept: image/png' --dump-header h3 -o f3
>> http://www.w3.org/Icons/w3c_home
>>
>> This gives us 3 files, two of which are identical in content:
>>
>> Dan-Brickleys-MacBook-Pro:img danbri$ ls -l f*
>> -rw-r--r--  1 danbri  staff  1936 14 May 09:07 f1
>> -rw-r--r--  1 danbri  staff  1865 14 May 09:08 f2
>> -rw-r--r--  1 danbri  staff  1936 14 May 09:08 f3
>>
>> If we check the headers, we see that 200 was used each time even when
>> the bytestream content varied. I believe you're using the word
>> 'document' at least sometimes to individuate about those things. The
>> variants seem to have the exact same last-modified time; this could
>> have been because they were part of the same CVS commit action to
>> w3.org.
>>
>> Dan-Brickleys-MacBook-Pro:img danbri$ file f*
>> f1: PNG image, 72 x 48, 8-bit colormap, non-interlaced
>> f2: GIF image data, version 89a, 72 x 48
>> f3: PNG image, 72 x 48, 8-bit colormap, non-interlaced
>> Dan-Brickleys-MacBook-Pro:img danbri$ cat h1
>> HTTP/1.1 200 OK
>> Date: Fri, 14 May 2010 07:07:49 GMT
>> Server: Apache/2
>> Content-Location: w3c_home.png
>> Vary: negotiate,accept
>> TCN: choice
>> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
>> ETag: "790-4195514757840;48498becf6180"
>> Accept-Ranges: bytes
>> Content-Length: 1936
>> Cache-Control: max-age=2592000
>> Expires: Sun, 13 Jun 2010 07:07:49 GMT
>> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
>> Connection: close
>> Content-Type: image/png; qs=0.7
>>
>> Dan-Brickleys-MacBook-Pro:img danbri$ cat h2
>> HTTP/1.1 200 OK
>> Date: Fri, 14 May 2010 07:08:09 GMT
>> Server: Apache/2
>> Content-Location: w3c_home.gif
>> Vary: negotiate,accept
>> TCN: choice
>> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
>> ETag: "749-4195514757840;48498becf6180"
>> Accept-Ranges: bytes
>> Content-Length: 1865
>> Cache-Control: max-age=2592000
>> Expires: Sun, 13 Jun 2010 07:08:09 GMT
>> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
>> Connection: close
>> Content-Type: image/gif; qs=0.5
>>
>> Dan-Brickleys-MacBook-Pro:img danbri$ cat h3
>> HTTP/1.1 200 OK
>> Date: Fri, 14 May 2010 07:08:22 GMT
>> Server: Apache/2
>> Content-Location: w3c_home.png
>> Vary: negotiate,accept
>> TCN: choice
>> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
>> ETag: "790-4195514757840;48498becf6180"
>> Accept-Ranges: bytes
>> Content-Length: 1936
>> Cache-Control: max-age=2592000
>> Expires: Sun, 13 Jun 2010 07:08:22 GMT
>> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
>> Connection: close
>> Content-Type: image/png; qs=0.7
>>
>>
>> Can you re-tell your story in a way that allows the abstract digital
>> image here to be an information resource? If we substitute RDF/XML  
>> for
>> GIF, NTriples for PNG, and 'digital image' for 'graph' you seem to be
>> arguing that W3C's Web server is misconfigured, and that the 200 HTTP
>> answers here are inappropriate. Or am I misreading your point?
>>
>> cheers,
>>
>> Dan
>>
>
>

------------------------------------------------------------
IHMC                                     (850)434 8903 or (650)494 3973
40 South Alcaniz St.           (850)202 4416   office
Pensacola                            (850)202 4440   fax
FL 32502                              (850)291 0667   mobile
phayesAT-SIGNihmc.us       http://www.ihmc.us/users/phayes
Received on Friday, 14 May 2010 21:18:36 UTC