- From: Jonathan Rees <jar@creativecommons.org>
- Date: Fri, 14 May 2010 09:31:29 -0400
- To: Dan Brickley <danbri@danbri.org>
- Cc: Pat Hayes <phayes@ihmc.us>, AWWSW TF <public-awwsw@w3.org>
Thanks for contributing this example, Dan. I got sucked into this problem in 2006 when I needed URIs for things like journal articles and images, as opposed to web pages that happen to carry journal articles or web resources from which I could obtain encodings of an image. I wanted to know whether a non-# 2xx http: URI could/should refer to a journal article or digital image. I still don't have an answer. Pat says no - using a 2xx URI to refer to an article or image independent of encoding violates his interpretation of the httpRange-14 rule. TimBL says yes (see "generic resources" memo) as do others (e.g. Roy, AWWW, DanB here). My instinct is that if we don't say yes we'll raise the semweb barrier to entry still further, and discourage the creation of metadata (the latter being my current crusade), so I am inclined to agree with Tim. Alan R says that generic resources such as articles and images *risk* violating the rule, with its latent fear and uncertainty, and that's enough to force a # URI (or non-2xx). Pat and Alan have different reasoning but the same conclusion: interpret 2xx URIs only as resources that are isoontic to some single REST-representation. FWIW Ian Hickson also rejects the idea of generic resource, and I think Larry Masinter does too (or I infer this from comments he's made during discussions of metadata; but his position may be more nuanced). I think that for semweb purposes it's good practice for things to be defined and distinguished by their properties and class memberships whenever possible. In the case of the article or image the REST-representation has no "interesting" properties (author, date, location, subject matter, etc) that the article or image doesn't also have, so IMO there is little point in assuming they're distinct. (I think I'm repeating Dan C?) The article is obtained by "subtracting" uninteresting properties from one of its REST-representations. It's only if you insist on transferring *all* properties (such as length, media type, resolution, encoding) from a REST-representation to its resource that you are forced to distinguish the article from the resource (and you would then have to give a # or non-2xx URI to the article). But if "too many" properties of the REST-representation hold of the resource, you will get inconsistencies, since as Dan B spells out RFC 2616 very clearly permits, and many web sites assert, simultaneous distinct REST-representations (i.e. having inconsistent properties) for the same resource. You cannot have a resource whose octet-length is both 2273 and 31177, as length is functional, but according to 2616 you can have a resource with a REST-representation of length 2273 and another REST-representation of length 31177. (The resource itself would not have a length at all.) I think Pat says, when attempting to formulate a satisfying interpretation, turn a blind eye toward all but one of the resource's REST-representations. I'm not keen on the solution because it would prevent reasoning (or enable incorrect reasoning) about resources that have multiple REST-representations - what the REST-representations are, how they compare, how they change over time, when they are emitted (by the server), and so on. This idea of property (or metadata) transfer from REST-representation to resource (i.e. inference rules of the form P(REST-rep) -> P(resource)) has been the most useful heuristic I've come up with so far for explaining the REST/AWWW architecture to myself. It's very similar to the way FRBR works. We might have Item = portion of physical medium, Manifestation = REST-representation, Expression / Work = particular kinds of static "information resources" (not an exhaustive covering of IRs or GRs). A manifestation and a work can share some of their properties but won't share all of them. If "generic resources" (those related to more than one REST-representation) really are useless as metadata subjects I guess we could reject 2616/REST/AWWW, but that would be a chore. Jonathan On Fri, May 14, 2010 at 3:16 AM, Dan Brickley <danbri@danbri.org> wrote: > On Mon, May 10, 2010 at 10:25 PM, Pat Hayes <phayes@ihmc.us> wrote: >> Let me give an intuitive case in support of the Nays here. An RDF graph is a >> set, which is not the same as a document, for sure. The *same* graph can be >> encoded in a variety of different syntactic forms. > > Try re-running this to show that digital images aren't information resources? > > 'The *same* digital image can be encoded in a variety of different > syntactic forms (PNG, GIF, JPEG, SVG, ...). > > (Or even that books aren't information resources ('the same book can > be encoded in a variety of different forms; both digital > (text,html,pdf and various arrangements of atoms).) > >> Consider two documents, >> one in RDF/XML, the other in NTriples, describing the same graph. If we >> identify the document with the graph it describes, then these have to be the >> same. But they aren't the same. So even if a graph is an information >> resource (and I agree that one can make out a case for that position), it >> certainly isn't the same information resource as any document (In RDF/XML or >> NTriples or any other notation) that represents it syntactically. So, one >> ought to use redirection to refer to it, according to http-range-14. So, >> whether its an information resource or not is kind of moot, since even if it >> is, it can't be directly identified by a URI which returns a 200 code. > > http://www.w3.org/Icons/w3c_home.png > http://www.w3.org/Icons/w3c_home.gif > http://www.w3.org/Icons/w3c_home > > One generic digital resource, two specific 'bags of bits' that encode > it, three URIs and here's the gory details from talking to W3C's > server: > > First we make three requests to the server; first time not expressing > a preference towards any type of bag of bits, then we say we prefer > gif, then we say we prefer png: > > curl --dump-header h1 -o f1 http://www.w3.org/Icons/w3c_home > curl -H 'Accept: image/gif' --dump-header h2 -o f2 > http://www.w3.org/Icons/w3c_home > curl -H 'Accept: image/png' --dump-header h3 -o f3 > http://www.w3.org/Icons/w3c_home > > This gives us 3 files, two of which are identical in content: > > Dan-Brickleys-MacBook-Pro:img danbri$ ls -l f* > -rw-r--r-- 1 danbri staff 1936 14 May 09:07 f1 > -rw-r--r-- 1 danbri staff 1865 14 May 09:08 f2 > -rw-r--r-- 1 danbri staff 1936 14 May 09:08 f3 > > If we check the headers, we see that 200 was used each time even when > the bytestream content varied. I believe you're using the word > 'document' at least sometimes to individuate about those things. The > variants seem to have the exact same last-modified time; this could > have been because they were part of the same CVS commit action to > w3.org. > > Dan-Brickleys-MacBook-Pro:img danbri$ file f* > f1: PNG image, 72 x 48, 8-bit colormap, non-interlaced > f2: GIF image data, version 89a, 72 x 48 > f3: PNG image, 72 x 48, 8-bit colormap, non-interlaced > Dan-Brickleys-MacBook-Pro:img danbri$ cat h1 > HTTP/1.1 200 OK > Date: Fri, 14 May 2010 07:07:49 GMT > Server: Apache/2 > Content-Location: w3c_home.png > Vary: negotiate,accept > TCN: choice > Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT > ETag: "790-4195514757840;48498becf6180" > Accept-Ranges: bytes > Content-Length: 1936 > Cache-Control: max-age=2592000 > Expires: Sun, 13 Jun 2010 07:07:49 GMT > P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" > Connection: close > Content-Type: image/png; qs=0.7 > > Dan-Brickleys-MacBook-Pro:img danbri$ cat h2 > HTTP/1.1 200 OK > Date: Fri, 14 May 2010 07:08:09 GMT > Server: Apache/2 > Content-Location: w3c_home.gif > Vary: negotiate,accept > TCN: choice > Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT > ETag: "749-4195514757840;48498becf6180" > Accept-Ranges: bytes > Content-Length: 1865 > Cache-Control: max-age=2592000 > Expires: Sun, 13 Jun 2010 07:08:09 GMT > P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" > Connection: close > Content-Type: image/gif; qs=0.5 > > Dan-Brickleys-MacBook-Pro:img danbri$ cat h3 > HTTP/1.1 200 OK > Date: Fri, 14 May 2010 07:08:22 GMT > Server: Apache/2 > Content-Location: w3c_home.png > Vary: negotiate,accept > TCN: choice > Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT > ETag: "790-4195514757840;48498becf6180" > Accept-Ranges: bytes > Content-Length: 1936 > Cache-Control: max-age=2592000 > Expires: Sun, 13 Jun 2010 07:08:22 GMT > P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml" > Connection: close > Content-Type: image/png; qs=0.7 > > > Can you re-tell your story in a way that allows the abstract digital > image here to be an information resource? If we substitute RDF/XML for > GIF, NTriples for PNG, and 'digital image' for 'graph' you seem to be > arguing that W3C's Web server is misconfigured, and that the 200 HTTP > answers here are inappropriate. Or am I misreading your point? > > cheers, > > Dan >
Received on Friday, 14 May 2010 13:32:05 UTC