Re: LRDD Update (Resource Descriptor Discovery) and Proposed Changes from Jonathan Rees on 2009-06-29 (www-tag@w3.org from June 2009)

From: Jonathan Rees <jar@creativecommons.org>
Date: Mon, 29 Jun 2009 13:50:30 -0400
To: Xiaoshu Wang <wangxiao@musc.edu>
Cc: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <760bcb2a0906291050y1cfb9ac7i8ad5ee4b3a13d82@mail.gmail.com>
On Mon, Jun 29, 2009 at 11:46 AM, Xiaoshu Wang<wangxiao@musc.edu> wrote:
> >I don't even know what the problem is, then
> how can we propose anything to do something about it?  Then, please define
> the "metadata/descriptor" first.

I don't understand what the difficulty is with the definition (which
you've heard many times now)? A description resource is simply a
document in the role of describing something. Is it hard for you to
figure out when a document describes something? The relation is
foaf:primaryTopic, or the related powder:describedby, both of which
are widely used.

If the trouble is distinguishing a "document" or "information
resource" from a chair, or "describing" from "being", I can't help
you.

If the trouble is the dual sense of "document" as in
"webarch-representation" (what's transmitted) vs. "information
resource" (something that can be changed, re-expressed, etc.), first
note that "representation" is a term of art in the webarch document
and shouldn't be taken to mean "representation" in common usage (as if
there were a single sense). But I think the overloading is plausible.

IR is a sort of ontological reverse engineering: If one were to use a
URI in an href= attribute to refer to something, to what would it
refer? Fill in the blank in "the source page links to a ____." It
might refer to a thing that has an author or a subject (chairs don't),
or that carries information digitally (chairs don't). The target is
something that is (or could be) "on the web" in a way that a chair
can't be. If you click on a link, and get to a picture of a chair, you
have gone to the picture, not to the chair.  -- This is simply an
architectural choice, not a matter of fact. There is no point in
arguing against it as if it were wrong. You have to argue on the basis
of utility.

If the trouble is the exact line between "document" or "information
resource" and something like a chair that isn't, I have some sympathy,
and I'm working on addressing this (in my spare time; see previous
paragraph). But I think most parties to the conversation agree that
some things are IRs and some aren't, even if the boundary is unclear.
And for various reasons not everyone agrees that a rigorous definition
is either necessary or possible. Anyhow what are the consequences of
disagreeing over the boundaries of the GET/200 restriction? It's just
advice, and if you don't like it or don't know how to apply it, then
just ignore it! And by all means send in your difficult use cases to
help us figure it out (I already have a collection).

So you asked what problems are there to be solved. The problems here
that *I* would be trying to solve in a web architecture are not
philosophical or even ontological but rather pragmatic, and include
(1) following a hyperlink to get something, and finding garbage,
because conneg did a bait-and-switch; (2) concluding that a person has
an author, or that a person was a book, or that the author of a book
was someone who wasn't, after collecting RDF from two different
locations (the RDF having been separately curated in response to
observing 200 responses.) The current TAG advice is one approach to
addressing these. The only solution to (1) is having all simultaneous
webarch-representations convey the same message, not merely different
messages about the same thing. *Any* solution to (2) will attempt to
get the community to make X vs. about-X reference decisions
consistently, but one design steers everyone towards serving
expressions (wa-representations, translations, etc.), and the other
steers everyone the other.

Model T of 200-responding URI U:

. U identifies web page
. U refers to web page ("information resource")
. Response R from U is an expression (restatement, rendering,
representation, translation, reformatting, ...) of referent by U
. R says what U's referent says (expresses its information)

Model X:
. U identifies web page ...per 2616...?
. U refers to arbitrary thing (supposed to be obvious by looking at R?)
. Response R from U is somehow related (how?) to referent of U
. R says anything it wants to about referent of U (not necessarily the
same as other wa-representations)

To compare these, consider what each model predicts about how someone
might use 200-responding URIs to refer. For example, consider the URI
"http://en.wikipedia.org/wiki/Magna_carta". Someone following Model T
would take the URI to refer to the wikipedia article about the Magna
Carta. Someone following Model X would take it to refer to the Magna
Carta.

Then what about "http://www.thelatinlibrary.com/magnacarta.html" ?
Under Model T, the URI refers to the Magna Carta (or maybe that
particular incarnation of; but the difference may not matter for the
application at hand) - the same thing that the wikipedia URI referred
to under Model T. Under Model X, the URI refers to... what? The rights
of man? By failing to distinguish between a document and a description
of a document, one is deprived of URIs for things that one wants to
refer to - the same things that one wants hyperlink-followers to see
by following links. If I link to wikipedia, I want you to go to the
wikipedia article, darnit, not to the Magna Carta.

And if I ask to follow a link to the Magna Carta, I similarly *don't*
want to see a description of it, even if my preferred content-type is
RDF! The RDF I get should be an RDF *expression* (translation) of the
Magna Carta, because I read RDF more easily than I read Latin; not
some random pile of other information, no matter how useful.

Exercise: Apply the two models to this URI: http://news.google.com/

I'm not saying the work on this subject is over; I'm just saying that
the X/about-X confusion (which is a kind of use/mention confusion) is
a legitimate problem, and justifies some amount of "obsession". I
think the solution will be a sensible explanation, and you and Michael
are right that at present there is no good consensus document, and
that there should be.

-Jonathan
Received on Monday, 29 June 2009 17:51:07 UTC