Re: LRDD Update (Resource Descriptor Discovery) and Proposed Changes from Xiaoshu Wang on 2009-06-29 (www-tag@w3.org from June 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Mon, 29 Jun 2009 14:55:25 -0400
To: Jonathan Rees <jar@creativecommons.org>
CC: "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <4A490E1D.4090900@musc.edu>
Jonathan Rees wrote:
> On Mon, Jun 29, 2009 at 11:46 AM, Xiaoshu Wang<wangxiao@musc.edu> wrote:
>   
>>> I don't even know what the problem is, then
>>>       
>> how can we propose anything to do something about it?  Then, please define
>> the "metadata/descriptor" first.
>>     
>
> I don't understand what the difficulty is with the definition (which
> you've heard many times now)? A description resource is simply a
> document in the role of describing something. Is it hard for you to
> figure out when a document describes something? The relation is
> foaf:primaryTopic, or the related powder:describedby, both of which
> are widely used.
>   
O.K. Assuming that you are a URI owner.  So, given a set of 
informations, say it is put in an RDF document, then tell me which 
sub-graph that I should put in as "descriptive" while the other "not"?  
If you cannot tell them apart by a definite criteria, then how should 
you know what to put in the entity and what to put in the LINK header.  
(Don't think the case of internal link, it is different). 

Now, imagine being a client, should you just get the entity or LINK or 
both?  How do you know?  I know I want some information about say, 
Dublin-core metadata, how do I know if it is in LINK or in Entity or both?

If your answer is anything goes, then where is the "uniformity"?
> If the trouble is distinguishing a "document" or "information
> resource" from a chair, or "describing" from "being", I can't help
> you.
>   
Define your word "document" because it is sometimes in AWWW used as an 
"information resource", which I do have trouble grasping.  So, please 
explain.
> If the trouble is the dual sense of "document" as in
> "webarch-representation" (what's transmitted) vs. "information
> resource" (something that can be changed, re-expressed, etc.), first
> note that "representation" is a term of art in the webarch document
> and shouldn't be taken to mean "representation" in common usage (as if
> there were a single sense). But I think the overloading is plausible.
>
> IR is a sort of ontological reverse engineering: If one were to use a
> URI in an href= attribute to refer to something, to what would it
> refer? Fill in the blank in "the source page links to a ____." 
Hold on!  Define your source page first, is it a representation or a 
resource.
> It
> might refer to a thing that has an author or a subject (chairs don't),
> or that carries information digitally (chairs don't). 
Why chair doesn't?  Can I say "the page is laying on a chair"?
> The target is
> something that is (or could be) "on the web" in a way that a chair
> can't be. If you click on a link, and get to a picture of a chair, you
> have gone to the picture, not to the chair.  -- This is simply an
> architectural choice, not a matter of fact. There is no point in
> arguing against it as if it were wrong. You have to argue on the basis
> of utility.
>   
Yes, if it is an architectural choice of a particular user, then it is 
fine because it does not affect me.  But if TAG means intends to make it 
an architectural choice of everyone.  In the latter case, I have 
complain because it makes it impossible to work with.

Which choice is TAG's?  Please be clear.
> If the trouble is the exact line between "document" or "information
> resource" and something like a chair that isn't, I have some sympathy,
> and I'm working on addressing this (in my spare time; see previous
> paragraph). But I think most parties to the conversation agree that
> some things are IRs and some aren't, even if the boundary is unclear.
>   
Really? Is an image an IR?  I think everyone has a very firm answer on 
that.  But I really have trouble to figure it out.  When put under the 
examination of conneg, I can no longer be sure what a URI denote.
> And for various reasons not everyone agrees that a rigorous definition
> is either necessary or possible. Anyhow what are the consequences of
> disagreeing over the boundaries of the GET/200 restriction? It's just
> advice, and if you don't like it or don't know how to apply it, then
> just ignore it! And by all means send in your difficult use cases to
> help us figure it out (I already have a collection).
>
> So you asked what problems are there to be solved. The problems here
> that *I* would be trying to solve in a web architecture are not
> philosophical or even ontological but rather pragmatic, and include
>   
Yes, Pragmatics! This is what I want. 
> (1) following a hyperlink to get something, and finding garbage,
> because conneg did a bait-and-switch; (2) concluding that a person has
> an author, or that a person was a book, or that the author of a book
> was someone who wasn't, after collecting RDF from two different
> locations (the RDF having been separately curated in response to
> observing 200 responses.) 
Wait.  You mean by collecting 200 responses as your personal choice or 
TAG's recommended choice?   I have no saying about the former because a 
client can filter  his/her information anyway s/he wants.  But if it is 
the latter, then shouldn't I and you be worried?  And then this goes 
back to the question of IR definition?

I want it to be pragmatic in such that I gathered a bunch of 
information, whether it by HTTP, ftp, mailto, or even snail mail, etc. 
and then I read its content to judge if any claim is true or not. 

httpRange-14 says, no.  you cannot do that.  You have to check if a URI 
200 or not.  Which one is more pragmatic?

> The current TAG advice is one approach to
> addressing these. The only solution to (1) is having all simultaneous
> webarch-representations convey the same message, not merely different
> messages about the same thing. *Any* solution to (2) will attempt to
> get the community to make X vs. about-X reference decisions
> consistently, but one design steers everyone towards serving
> expressions (wa-representations, translations, etc.), and the other
> steers everyone the other.
>   
Please!  Let's define your *equivalence* first before making any further 
assumption. This is what many people has asked TAG to clarify what is 
granted under content negotiation.
> Model T of 200-responding URI U:
>
> . U identifies web page
> . U refers to web page ("information resource")
> . Response R from U is an expression (restatement, rendering,
> representation, translation, reformatting, ...) of referent by U
> . R says what U's referent says (expresses its information)
>   
Define "web page"? First, do we agree that there are three entities in 
the Web, URI, Representation and Resource?  Then is the "web page" a 
subclass of which?  Or you have another more fundamental concept?
> Model X:
> . U identifies web page ...per 2616...?
>   
Nope.  There is no concept of "web page"
> . U refers to arbitrary thing (supposed to be obvious by looking at R?)
>   
Yes.
> . Response R from U is somehow related (how?) to referent of U
>   
Why the nature of R has anything to do with U?  R describes what U's 
referent is.  As a client, you either accept it or not.  It is as simple 
as that.
> . R says anything it wants to about referent of U (not necessarily the
> same as other wa-representations)
>   
Of course, you think TAG can do anything otherwise?
> To compare these, consider what each model predicts about how someone
> might use 200-responding URIs to refer. For example, consider the URI
> "http://en.wikipedia.org/wiki/Magna_carta". Someone following Model T
> would take the URI to refer to the wikipedia article about the Magna
> Carta. Someone following Model X would take it to refer to the Magna
> Carta.
>   
No.  Unless there is an explicit statement saying 
"http://en.wikipedia.org/wiki/Magna_carta" is Magna Carta, then it is.  
It is up to the URI's owner but not TAG.  But the URI never denote the 
thing that flies into your laptop.

> Then what about "http://www.thelatinlibrary.com/magnacarta.html" ?
> Under Model T, the URI refers to the Magna Carta (or maybe that
> particular incarnation of; but the difference may not matter for the
> application at hand) - the same thing that the wikipedia URI referred
> to under Model T. Under Model X, the URI refers to... what? The rights
> of man? By failing to distinguish between a document and a description
> of a document, one is deprived of URIs for things that one wants to
> refer to - the same things that one wants hyperlink-followers to see
> by following links. If I link to wikipedia, I want you to go to the
> wikipedia article, darnit, not to the Magna Carta.
>   
What about it?  A URI's semantics is opaque, what difference does an 
extra ".html" should make?  I am not sure what you are leading to.
> And if I ask to follow a link to the Magna Carta, I similarly *don't*
> want to see a description of it, even if my preferred content-type is
> RDF! The RDF I get should be an RDF *expression* (translation) of the
> Magna Carta, because I read RDF more easily than I read Latin; not
> some random pile of other information, no matter how useful.
>   
You always sees a description.  But you always get a *description* but 
never the real thing.  This is not just about philosophy but also 
physics, If you read something about quantum mechanics, you will know 
how much you assumptions that we have been taking.  We know things by 
"interacting" with it.  We see something, such as a particle, by shining 
a light on it and read its inflection.  You never really know where the 
particle is and/or its velocity.  This is Heisenberg uncertainty principle.
> Exercise: Apply the two models to this URI: http://news.google.com/
>
> I'm not saying the work on this subject is over; I'm just saying that
> the X/about-X confusion (which is a kind of use/mention confusion) is
> a legitimate problem, and justifies some amount of "obsession". I
> think the solution will be a sensible explanation, and you and Michael
> are right that at present there is no good consensus document, and
> that there should be.
>   
There is never a confusion to start with. A representation retrieved 
from a URI is always something about the URI's referent.  It is as 
simple as that.   It is the IR/httpRange-14 that wants to define this 
about-ness.  But it has given rise to all sorts of the problems. 

Would you agree that if there is a syntactic difference in URI to 
distinguish a Representation from a Resource, it would force people to 
be clear on what they are writing in URI?  Don't you think, that is a 
better way to avoid ambiguity than a definition composed of "all, 
essential, can"?

Xiaoshu

Xiaoshu
Received on Monday, 29 June 2009 18:56:09 UTC