- From: Pat Hayes <phayes@ihmc.us>
- Date: Fri, 14 May 2010 16:17:32 -0500
- To: Jonathan Rees <jar@creativecommons.org>
- Cc: Dan Brickley <danbri@danbri.org>, AWWSW TF <public-awwsw@w3.org>
On May 14, 2010, at 8:31 AM, Jonathan Rees wrote: > Thanks for contributing this example, Dan. > > I got sucked into this problem in 2006 when I needed URIs for things > like journal articles and images, as opposed to web pages that happen > to carry journal articles or web resources from which I could obtain > encodings of an image. I wanted to know whether a non-# 2xx http: URI > could/should refer to a journal article or digital image. I still > don't have an answer. > > Pat says no - using a 2xx URI to refer to an article or image > independent of encoding violates his interpretation of the > httpRange-14 rule. Well, not exactly. My view of http-range-14 is that the 200 code response indicates that the URI denotes whatever it "identifies". If that is a generic resource, then fine (insofar that one can make clear exactly what such a thing is, but I'm happy with it being the abstraction that stands immediately behind some content negotiation.) But my argument was that something like an RDF/XML encoding of an RDF graph is a fully-fledged information resource in its own right, not merely one component of a content-negotiable bundle which can be conceptualized as a generic resource, like DanB's example of the various image file formats. And if so, then *it* is what is denoted by a URI that returns it. Now, I guess that one could have the very same 'thing' be both a resource in its own right and also be part of a content-negotiated bundle, with a different URI which accesses the content negotiation process. I have a worry with this, however, as it seems to drive a truck through the purpose of httpRange14, since it means there is then no way, given a URI which yields a 200 level response, to figure out what it denotes. Which I thought was the whole point. (Maybe my interpretation of that ruling is too narrow. Though if so, I really cannot see what the point of the ruling is.) So if someone urges this idea, let me ask them: what specifies that one URI A denotes the concrete resource (eg the RDF/XML document) and that the other URI B denotes a more abstract entity which can provide different representations based on content negotiation, when both of these URIs send back identical 200-coded responses to a GET? How is the 'naming' done? However, quite independently of content negotiation, I don't think that RDF graphs (or indeed literary works) *are* generic resources in the same way that a bundle of image files is. Content negotiation is never going to get me a first edition of Moby Dick, or a rendering of an RDF graph as a drawing on a piece of paper. Going all the way up to that level of abstraction seems like too much of a reach for content negotiation. > TimBL says yes (see "generic resources" memo) as > do others (e.g. Roy, AWWW, DanB here). My instinct is that if we don't > say yes we'll raise the semweb barrier to entry still further, and > discourage the creation of metadata (the latter being my current > crusade), so I am inclined to agree with Tim. I fail to see why this would be the case. It would mean only that things that can be retrieved by HTTP will be digital entities rather than abstractions; but we knew that already. There is no problem with having a URI denote the work Moby Dick. Its just that you can't retrieve that work directly using HTTP. (The URI might do a 303 redirect to something that provides you with a content-negotiated version of the text, for example.) But if you think about it for a second, surely that too is obvious. You never could pick up THE WORK, even in a very good bookshop. > > Alan R says that generic resources such as articles and images *risk* > violating the rule, with its latent fear and uncertainty, and that's > enough to force a # URI (or non-2xx). Pat and Alan have different > reasoning but the same conclusion: interpret 2xx URIs only as > resources that are isoontic to some single REST-representation. > > FWIW Ian Hickson also rejects the idea of generic resource, and I > think Larry Masinter does too (or I infer this from comments he's made > during discussions of metadata; but his position may be more nuanced). > > I think that for semweb purposes it's good practice for things to be > defined and distinguished by their properties and class memberships > whenever possible. In the case of the article or image the > REST-representation has no "interesting" properties (author, date, > location, subject matter, etc) that the article or image doesn't also > have, so IMO there is little point in assuming they're distinct. ? That seems quite obviously false. For a start, they have a file type and an encoding. These things are the very stuff of digital processing. > (I > think I'm repeating Dan C?) The article is obtained by "subtracting" > uninteresting properties from one of its REST-representations. It's > only if you insist on transferring *all* properties (such as length, > media type, resolution, encoding) from a REST-representation to its > resource that you are forced to distinguish the article from the > resource (and you would then have to give a # or non-2xx URI to the > article). Right now, one can own the copyright of an article but a journal publisher can own the copyright to the page image of that article which was printed in their journal. So it is important to be able to clearly distinguish properties which change when a page image is altered; and the version which preserves those properties is certainly a resource in its own right. Large amounts of money might depend on keeping that straight. > > But if "too many" properties of the REST-representation hold of the > resource, you will get inconsistencies, since as Dan B spells out RFC > 2616 very clearly permits, and many web sites assert, simultaneous > distinct REST-representations (i.e. having inconsistent properties) > for the same resource. Just an aside, but who says this is the same resource? Does anything in the pre-semantic Web depend on there being a single resource in cases like this? If someone were to insist that there were, in fact, several resources here, one corresponding to each REST representation, and that the URI was ambiguous between them; what would change? This would be consistent with RFC 2616 (which singularly fails to provide even the beginning of identity criteria for 'resources'.) > You cannot have a resource whose octet-length > is both 2273 and 31177, as length is functional, but according to 2616 > you can have a resource with a REST-representation of length 2273 and > another REST-representation of length 31177. > (The resource itself > would not have a length at all.) > > I think Pat says, when attempting to formulate a satisfying > interpretation, turn a blind eye toward all but one of the resource's > REST-representations. No, Im cool with abstractions like the one you cite, provided one is willing to accept that they render their concretizations 'invisible' to reference. But Im not so sanguine as you appear to be about that. I think that it is often the most concrete instance which is of most interest to applications. > I'm not keen on the solution because it would > prevent reasoning (or enable incorrect reasoning) about resources that > have multiple REST-representations - what the REST-representations > are, how they compare, how they change over time, when they are > emitted (by the server), and so on. > > This idea of property (or metadata) transfer from REST-representation > to resource (i.e. inference rules of the form P(REST-rep) -> > P(resource)) has been the most useful heuristic I've come up with so > far for explaining the REST/AWWW architecture to myself. It's very > similar to the way FRBR works. We might have Item = portion of > physical medium, Manifestation = REST-representation, Expression / > Work = particular kinds of static "information resources" (not an > exhaustive covering of IRs or GRs). A manifestation and a work can > share some of their properties but won't share all of them. > > If "generic resources" (those related to more than one > REST-representation) really are useless as metadata subjects I guess > we could reject 2616/REST/AWWW, but that would be a chore. > > Jonathan > > On Fri, May 14, 2010 at 3:16 AM, Dan Brickley <danbri@danbri.org> > wrote: >> On Mon, May 10, 2010 at 10:25 PM, Pat Hayes <phayes@ihmc.us> wrote: >>> Let me give an intuitive case in support of the Nays here. An RDF >>> graph is a >>> set, which is not the same as a document, for sure. The *same* >>> graph can be >>> encoded in a variety of different syntactic forms. >> >> Try re-running this to show that digital images aren't information >> resources? >> >> 'The *same* digital image can be encoded in a variety of different >> syntactic forms (PNG, GIF, JPEG, SVG, ...). >> >> (Or even that books aren't information resources ('the same book can >> be encoded in a variety of different forms; both digital >> (text,html,pdf and various arrangements of atoms).) >> >>> Consider two documents, >>> one in RDF/XML, the other in NTriples, describing the same graph. >>> If we >>> identify the document with the graph it describes, then these have >>> to be the >>> same. But they aren't the same. So even if a graph is an information >>> resource (and I agree that one can make out a case for that >>> position), it >>> certainly isn't the same information resource as any document (In >>> RDF/XML or >>> NTriples or any other notation) that represents it syntactically. >>> So, one >>> ought to use redirection to refer to it, according to http- >>> range-14. So, >>> whether its an information resource or not is kind of moot, since >>> even if it >>> is, it can't be directly identified by a URI which returns a 200 >>> code. >> >> http://www.w3.org/Icons/w3c_home.png >> http://www.w3.org/Icons/w3c_home.gif >> http://www.w3.org/Icons/w3c_home >> >> One generic digital resource, two specific 'bags of bits' that encode >> it, three URIs and here's the gory details from talking to W3C's >> server: >> >> First we make three requests to the server; first time not expressing >> a preference towards any type of bag of bits, then we say we prefer >> gif, then we say we prefer png: >> >> curl --dump-header h1 -o f1 http://www.w3.org/Icons/w3c_home >> curl -H 'Accept: image/gif' --dump-header h2 -o f2 >> http://www.w3.org/Icons/w3c_home >> curl -H 'Accept: image/png' --dump-header h3 -o f3 >> http://www.w3.org/Icons/w3c_home >> >> This gives us 3 files, two of which are identical in content: >> >> Dan-Brickleys-MacBook-Pro:img danbri$ ls -l f* >> -rw-r--r-- 1 danbri staff 1936 14 May 09:07 f1 >> -rw-r--r-- 1 danbri staff 1865 14 May 09:08 f2 >> -rw-r--r-- 1 danbri staff 1936 14 May 09:08 f3 >> >> If we check the headers, we see that 200 was used each time even when >> the bytestream content varied. I believe you're using the word >> 'document' at least sometimes to individuate about those things. The >> variants seem to have the exact same last-modified time; this could >> have been because they were part of the same CVS commit action to >> w3.org. >> >> Dan-Brickleys-MacBook-Pro:img danbri$ file f* >> f1: PNG image, 72 x 48, 8-bit colormap, non-interlaced >> f2: GIF image data, version 89a, 72 x 48 >> f3: PNG image, 72 x 48, 8-bit colormap, non-interlaced >> Dan-Brickleys-MacBook-Pro:img danbri$ cat h1 >> HTTP/1.1 200 OK >> Date: Fri, 14 May 2010 07:07:49 GMT >> Server: Apache/2 >> Content-Location: w3c_home.png >> Vary: negotiate,accept >> TCN: choice >> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT >> ETag: "790-4195514757840;48498becf6180" >> Accept-Ranges: bytes >> Content-Length: 1936 >> Cache-Control: max-age=2592000 >> Expires: Sun, 13 Jun 2010 07:07:49 GMT >> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" >> Connection: close >> Content-Type: image/png; qs=0.7 >> >> Dan-Brickleys-MacBook-Pro:img danbri$ cat h2 >> HTTP/1.1 200 OK >> Date: Fri, 14 May 2010 07:08:09 GMT >> Server: Apache/2 >> Content-Location: w3c_home.gif >> Vary: negotiate,accept >> TCN: choice >> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT >> ETag: "749-4195514757840;48498becf6180" >> Accept-Ranges: bytes >> Content-Length: 1865 >> Cache-Control: max-age=2592000 >> Expires: Sun, 13 Jun 2010 07:08:09 GMT >> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" >> Connection: close >> Content-Type: image/gif; qs=0.5 >> >> Dan-Brickleys-MacBook-Pro:img danbri$ cat h3 >> HTTP/1.1 200 OK >> Date: Fri, 14 May 2010 07:08:22 GMT >> Server: Apache/2 >> Content-Location: w3c_home.png >> Vary: negotiate,accept >> TCN: choice >> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT >> ETag: "790-4195514757840;48498becf6180" >> Accept-Ranges: bytes >> Content-Length: 1936 >> Cache-Control: max-age=2592000 >> Expires: Sun, 13 Jun 2010 07:08:22 GMT >> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" >> Connection: close >> Content-Type: image/png; qs=0.7 >> >> >> Can you re-tell your story in a way that allows the abstract digital >> image here to be an information resource? If we substitute RDF/XML >> for >> GIF, NTriples for PNG, and 'digital image' for 'graph' you seem to be >> arguing that W3C's Web server is misconfigured, and that the 200 HTTP >> answers here are inappropriate. Or am I misreading your point? >> >> cheers, >> >> Dan >> > > ------------------------------------------------------------ IHMC (850)434 8903 or (650)494 3973 40 South Alcaniz St. (850)202 4416 office Pensacola (850)202 4440 fax FL 32502 (850)291 0667 mobile phayesAT-SIGNihmc.us http://www.ihmc.us/users/phayes
Received on Friday, 14 May 2010 21:18:36 UTC