Re: [pedantic-web] Re: The OWL Ontology URI from Jonathan Rees on 2010-05-14 (public-awwsw@w3.org from May 2010)

From: Jonathan Rees <jar@creativecommons.org>
Date: Fri, 14 May 2010 09:31:29 -0400
To: Dan Brickley <danbri@danbri.org>
Cc: Pat Hayes <phayes@ihmc.us>, AWWSW TF <public-awwsw@w3.org>
Message-ID: <AANLkTimp4cQrzh-zu9BjpTyyuKZnBrLpNm4AWT13MgK_@mail.gmail.com>
Thanks for contributing this example, Dan.

I got sucked into this problem in 2006 when I needed URIs for things
like journal articles and images, as opposed to web pages that happen
to carry journal articles or web resources from which I could obtain
encodings of an image. I wanted to know whether a non-# 2xx http: URI
could/should refer to a journal article or digital image. I still
don't have an answer.

Pat says no - using a 2xx URI to refer to an article or image
independent of encoding violates his interpretation of the
httpRange-14 rule.  TimBL says yes (see "generic resources" memo) as
do others (e.g. Roy, AWWW, DanB here). My instinct is that if we don't
say yes we'll raise the semweb barrier to entry still further, and
discourage the creation of metadata (the latter being my current
crusade), so I am inclined to agree with Tim.

Alan R says that generic resources such as articles and images *risk*
violating the rule, with its latent fear and uncertainty, and that's
enough to force a # URI (or non-2xx). Pat and Alan have different
reasoning but the same conclusion: interpret 2xx URIs only as
resources that are isoontic to some single REST-representation.

FWIW Ian Hickson also rejects the idea of generic resource, and I
think Larry Masinter does too (or I infer this from comments he's made
during discussions of metadata; but his position may be more nuanced).

I think that for semweb purposes it's good practice for things to be
defined and distinguished by their properties and class memberships
whenever possible. In the case of the article or image the
REST-representation has no "interesting" properties (author, date,
location, subject matter, etc) that the article or image doesn't also
have, so IMO there is little point in assuming they're distinct. (I
think I'm repeating Dan C?) The article is obtained by "subtracting"
uninteresting properties from one of its REST-representations. It's
only if you insist on transferring *all* properties (such as length,
media type, resolution, encoding) from a REST-representation to its
resource that you are forced to distinguish the article from the
resource (and you would then have to give a # or non-2xx URI to the
article).

But if "too many" properties of the REST-representation hold of the
resource, you will get inconsistencies, since as Dan B spells out RFC
2616 very clearly permits, and many web sites assert, simultaneous
distinct REST-representations (i.e. having inconsistent properties)
for the same resource. You cannot have a resource whose octet-length
is both 2273 and 31177, as length is functional, but according to 2616
you can have a resource with a REST-representation of length 2273 and
another REST-representation of length 31177. (The resource itself
would not have a length at all.)

I think Pat says, when attempting to formulate a satisfying
interpretation, turn a blind eye toward all but one of the resource's
REST-representations. I'm not keen on the solution because it would
prevent reasoning (or enable incorrect reasoning) about resources that
have multiple REST-representations - what the REST-representations
are, how they compare, how they change over time, when they are
emitted (by the server), and so on.

This idea of property (or metadata) transfer from REST-representation
to resource (i.e. inference rules of the form P(REST-rep) ->
P(resource)) has been the most useful heuristic I've come up with so
far for explaining the REST/AWWW architecture to myself. It's very
similar to the way FRBR works. We might have Item = portion of
physical medium, Manifestation = REST-representation, Expression /
Work = particular kinds of static "information resources" (not an
exhaustive covering of IRs or GRs). A manifestation and a work can
share some of their properties but won't share all of them.

If "generic resources" (those related to more than one
REST-representation) really are useless as metadata subjects I guess
we could reject 2616/REST/AWWW, but that would be a chore.

Jonathan

On Fri, May 14, 2010 at 3:16 AM, Dan Brickley <danbri@danbri.org> wrote:
> On Mon, May 10, 2010 at 10:25 PM, Pat Hayes <phayes@ihmc.us> wrote:
>> Let me give an intuitive case in support of the Nays here. An RDF graph is a
>> set, which is not the same as a document, for sure. The *same* graph can be
>> encoded in a variety of different syntactic forms.
>
> Try re-running this to show that digital images aren't information resources?
>
> 'The *same* digital image can be encoded in a variety of different
> syntactic forms (PNG, GIF, JPEG, SVG, ...).
>
> (Or even that books aren't information resources ('the same book can
> be encoded in a variety of different forms; both digital
> (text,html,pdf and various arrangements of atoms).)
>
>> Consider two documents,
>> one in RDF/XML, the other in NTriples, describing the same graph. If we
>> identify the document with the graph it describes, then these have to be the
>> same. But they aren't the same. So even if a graph is an information
>> resource (and I agree that one can make out a case for that position), it
>> certainly isn't the same information resource as any document (In RDF/XML or
>> NTriples or any other notation) that represents it syntactically. So, one
>> ought to use redirection to refer to it, according to http-range-14. So,
>> whether its an information resource or not is kind of moot, since even if it
>> is, it can't be directly identified by a URI which returns a 200 code.
>
> http://www.w3.org/Icons/w3c_home.png
> http://www.w3.org/Icons/w3c_home.gif
> http://www.w3.org/Icons/w3c_home
>
> One generic digital resource, two specific 'bags of bits' that encode
> it, three URIs and here's the gory details from talking to W3C's
> server:
>
> First we make three requests to the server; first time not expressing
> a preference towards any type of bag of bits, then we say we prefer
> gif, then we say we prefer png:
>
> curl --dump-header h1 -o f1  http://www.w3.org/Icons/w3c_home
> curl -H 'Accept: image/gif' --dump-header h2 -o f2
> http://www.w3.org/Icons/w3c_home
> curl -H 'Accept: image/png' --dump-header h3 -o f3
> http://www.w3.org/Icons/w3c_home
>
> This gives us 3 files, two of which are identical in content:
>
> Dan-Brickleys-MacBook-Pro:img danbri$ ls -l f*
> -rw-r--r--  1 danbri  staff  1936 14 May 09:07 f1
> -rw-r--r--  1 danbri  staff  1865 14 May 09:08 f2
> -rw-r--r--  1 danbri  staff  1936 14 May 09:08 f3
>
> If we check the headers, we see that 200 was used each time even when
> the bytestream content varied. I believe you're using the word
> 'document' at least sometimes to individuate about those things. The
> variants seem to have the exact same last-modified time; this could
> have been because they were part of the same CVS commit action to
> w3.org.
>
> Dan-Brickleys-MacBook-Pro:img danbri$ file f*
> f1: PNG image, 72 x 48, 8-bit colormap, non-interlaced
> f2: GIF image data, version 89a, 72 x 48
> f3: PNG image, 72 x 48, 8-bit colormap, non-interlaced
> Dan-Brickleys-MacBook-Pro:img danbri$ cat h1
> HTTP/1.1 200 OK
> Date: Fri, 14 May 2010 07:07:49 GMT
> Server: Apache/2
> Content-Location: w3c_home.png
> Vary: negotiate,accept
> TCN: choice
> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
> ETag: "790-4195514757840;48498becf6180"
> Accept-Ranges: bytes
> Content-Length: 1936
> Cache-Control: max-age=2592000
> Expires: Sun, 13 Jun 2010 07:07:49 GMT
> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
> Connection: close
> Content-Type: image/png; qs=0.7
>
> Dan-Brickleys-MacBook-Pro:img danbri$ cat h2
> HTTP/1.1 200 OK
> Date: Fri, 14 May 2010 07:08:09 GMT
> Server: Apache/2
> Content-Location: w3c_home.gif
> Vary: negotiate,accept
> TCN: choice
> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
> ETag: "749-4195514757840;48498becf6180"
> Accept-Ranges: bytes
> Content-Length: 1865
> Cache-Control: max-age=2592000
> Expires: Sun, 13 Jun 2010 07:08:09 GMT
> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
> Connection: close
> Content-Type: image/gif; qs=0.5
>
> Dan-Brickleys-MacBook-Pro:img danbri$ cat h3
> HTTP/1.1 200 OK
> Date: Fri, 14 May 2010 07:08:22 GMT
> Server: Apache/2
> Content-Location: w3c_home.png
> Vary: negotiate,accept
> TCN: choice
> Last-Modified: Mon, 24 Jul 2006 14:58:33 GMT
> ETag: "790-4195514757840;48498becf6180"
> Accept-Ranges: bytes
> Content-Length: 1936
> Cache-Control: max-age=2592000
> Expires: Sun, 13 Jun 2010 07:08:22 GMT
> P3P: policyref="http://www.w3.org/2001/05/P3P/p3p.xml"
> Connection: close
> Content-Type: image/png; qs=0.7
>
>
> Can you re-tell your story in a way that allows the abstract digital
> image here to be an information resource? If we substitute RDF/XML for
> GIF, NTriples for PNG, and 'digital image' for 'graph' you seem to be
> arguing that W3C's Web server is misconfigured, and that the 200 HTTP
> answers here are inappropriate. Or am I misreading your point?
>
> cheers,
>
> Dan
>
Received on Friday, 14 May 2010 13:32:05 UTC