Re: Uniform access to metadata: XRD use case. from Xiaoshu Wang on 2009-02-25 (www-tag@w3.org from February 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Wed, 25 Feb 2009 15:55:03 +0000
To: Phil Archer <phil@philarcher.org>
CC: Eran Hammer-Lahav <eran@hueniverse.com>, Julian Reschke <julian.reschke@gmx.de>, "Patrick.Stickler@nokia.com" <Patrick.Stickler@nokia.com>, "jar@creativecommons.org" <jar@creativecommons.org>, "connolly@w3.org" <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <49A569D7.9040705@musc.edu>
Phil Archer wrote:
> Xiaoshu Wang wrote:
>   
>> Phil Archer wrote:
>>     
>>> Xiaoshu Wang wrote:
>>>  
>>>       
>>>> The critical flaw of all the proposed approach is that the definition 
>>>> of "metadata/descriptor" is ambiguous and hence useless in practice.  
>>>> Take the "describedBy" relations for example.  Here I quote from 
>>>> Eran's link.
>>>>
>>>>      The relationship A "describedby" B asserts that resource B
>>>>      provides a description of resource A. There are no constraints on
>>>>      the format or representation of either A or B, neither are there
>>>>      any further constraints on either resource.
>>>>
>>>> As a URI owner, I don't know what kind of stuff that I should put in 
>>>> A or B.
>>>>     
>>>>         
>>> Yes you do. You know that B has something to say about A. You don't, 
>>> however, know what format either is in or anything else. Those details 
>>> are handled by other mechanisms, notably the content type. In this link:
>>>
>>> Link: <foo.bar>; rel="describedby" type="application/thing";
>>>
>>> You would probably only fetch foo.bar if you had a UA that could 
>>> process application/thing. This is a hint - it may be superseded by 
>>> the more authoritative headers that come back if you dereference foo.bar)
>>>
>>>    As a URI client, how should I know when should I get A and when
>>>  
>>>       
>>>> B?     
>>>>         
>>> Because either:
>>>
>>>   - you're interested in A for any of the reasons you may be 
>>> interested in any resource (you're following a link, it's in a search 
>>> result or whatever). Optionally, you can find out more about A by 
>>> following the link to B.
>>>
>>>   - you're collecting URIs of resources that have particular features. 
>>> Therefore, you'll look for Bs and then use them to find As.
>>>   
>>>       
>> Honestly, do you think that answers any question that I raise?  If B 
>> describes A, and if I am interested in A, I am of course interested in 
>> B.  What particular features that I am looking to allow me to decide 
>> either A or B but not both A and B?
>>     
>>>   Since I don't know what I might be missing from either A or B, it
>>>  
>>>       
>>>> seems to suggest that I must always get both A and B.
>>>>     
>>>>         
>>> No. As an analogy: if an HTML page links to a stylesheet you can 
>>> choose whether to fetch the stylesheet or not in order to render the 
>>> page.
>>>   
>>>       
>> No.  This is not a reasonable analogy.  When I received a HTML page, (a 
>> representation btw), there exists a context that defines the semantics 
>> of stylesheet and it, in turn, helps a UA to decide accordingly.  At 
>> HTTP level, there is no such context because I know nothing except the 
>> URI denotes a Resource.  If you take this HTML page as an analogy, it 
>> means that I can move the HTML's stylesheet link into the HTTP layer as 
>> the HTTP Link?  Is this a good design?
>>     
>
> Well, Opera and Mozilla seem to think so. Take a look at 
> http://www.fosi.org/archive/httplinktest/ in either of those browsers 
> and you'll see the stylesheet has been imported by following the HTTP 
> link (IE doesn't).
>   
Browser does a lot of things for supporting non-standard behavior 
because its purpose is to serve the client.  But that doesn't make the 
design a good one.  What if there are two stylesheets, one in HTML and 
the other one in HTTP Link?  Which trumps.  And assuming the same URI 
has another representation, say either RDF or Image.  What would the 
Link stylesheet be?  (1) It still present, but means nothing.  Then, it 
is a waste of web resource. (2) It only present with HTML 
representation.  Then this suggests that the semantics of stylesheet is 
always associated with an HTML representation.  Hence, serving it at 
both places is both a waste of resource and prone to error. 
> And yes, as a vocal supporter of Mark Nottingham's work to reinstate 
> HTTP Link: I do think it's a good design to be able to link from one 
> thing to another at the the HTTP level. I may get one of any number of 
> representations of a resource back from a URI but that's no reason why 
> a single description of those representations should not apply to them 
> all. 
Isn't Conneg doing the same thing?
> This comes back to things like Thematic Consistency [1] and One Web.
>
>
>   
>>>   Thus, I cannot
>>>  
>>>       
>>>> help but wondering why they are not put together at A at the first 
>>>> place.
>>>>     
>>>>         
>>> Because they are often managed by different people, subject to 
>>> different production and editorial control etc. Take a content 
>>> production workflow. Often there is a relatively large number of 
>>> people (journalists, graphic artists etc.) who create the content 
>>> which is then subject to review by an editor(ial team). There are many 
>>> situations where the latter creates the metadata concerning resources 
>>> produced by the former.
>>>   
>>>       
>> Again, you assumed a working context.
>>     
>
> Like machines, I only ever work in a context. I think to strip away all 
> context is so abstract as to be unrealisable.
>   
Nope.  My point is not to strip away all context.  But if you propose 
something at its most generic level, you context becomes generic and 
essentially none.
>    This is no different from the
>   
>> WebDAV case.  It is invalid as a general mechanism.  Conneg can solve 
>> the problem too.  You define a format/service, preferably with a URI, 
>> say "b",  for the content of B deploy it under A.  If a user wants to 
>> get B-content, which implicitly suggests that they already know "b".  
>> Then, they request the "b"-content from A.
>>     
>
> You're assuming that the person providing the description is the same as 
> the person who created the resource or that they at least have the 
> access rights to edit it. I may want to publish something and say "The 
> information on this Web site is certified as being true by 
> musc.edu/professors/#doe.
>   
Sure, define a MIME type for that kind of information.  If someone wants 
to know that, then they can negotiate the content for it.  But no need 
to feed those client who is not interested in the information.
>>> As a little example:
>>>
>>> A is the homepage of a bank. It was last updated 2 hours ago.
>>> B tells you that A is the homepage of a bank. B was last updated 2 
>>> months ago.
>>>
>>> Current financial crisis notwithstanding, both are accurate, both have 
>>> been updated in a time frame that suggests they are actively managed.
>>>   
>>>       
>> I don't get it.  Shouldn't A's representation tells me that A is the 
>> homepage of a bank? 
>>     
>
> Only if you understand the representation - which, presumably a person 
> reading a rendered HTML page would do. A machine may well not.
>   
So, the difference is again format/language? That is exactly what CN does.
>   Why do I need B to tell me the same thing?
>
> They don't say the same thing at all. One says "Put your money here" and 
> the other says "that's a bank."
>   
I see. A says: "Here(A) is a bank, put your money here(A)".  B says: 
"There(A) is a bank, put your money there(A)".  They are the same.
>    And yhy
>   
>> would I be interested in the update of B?
>>     
>
> Because you want to know how up to date the description is as a 
> precursor to deciding whether you trust it or not.
>
>   
>>>> The same goes for MGET, how a user knows when to GET and when to 
>>>> MGET? PROFOUND is different because when people use it, they have 
>>>> already known that the resources is defined by WebDAV.   Hence, these 
>>>> kind of ideas only works when the client already have some knowledge 
>>>> about A.      
>>>>         
>>> I think you're getting into a bit of a tunnel here. How do you know 
>>> about anything on the Web? How do you discover anything? All the 
>>> mechanisms under discussion have their As and Bs (resources and 
>>> descriptions thereof). The current effort is all about trying to find 
>>> some uniformity of approach.
>>>   
>>>       
>> Yes, my assumption is that you don't know anything about a Resource at 
>> the first place.  
>>     
>
> Other than you know you're interested in it, surely? i.e. you have some 
> prior knowledge.
>
> Thus, given a resource's URI, if I am a specialized
>   
>> agent, say RDF agent, I would request something that I can understand, 
>> such as RDF/XML, n3 etc.
>>     
>
> Makes sense, yes.
>
>   On the other hand, if I am a general agent,
>   
>> such as a human, I would (1) conduct implicit Conneg, by request 
>> something that I prefer, such as HTML, or other things like image, 
>> audio, etc.,  
>>     
>
> No, you don't do that because you don't do HTTP - your agent does - 
> which immediately puts you into a context. Your UA has certain abilities 
> (it can render HTML, play video or whatever).
>   
Sure, I instruct my agent to do that, O.K?
> or (2) conduct transparent Conneg to ask what kind of
>   
>> services/content-types that the resource offer so I can choose.  If MIME
>> type is URIzed, then a general agent such as a human can follow each of 
>> the MIME-URI to understand what is the most appropriate for my need so 
>> that I can make my choice accordingly.
>>     
>
> OK.
>
>   
>> This is not as what you said "how can you discover anything?".  It is 
>> exactly the opposite, it allows you to discover everything.
>>     
>
> Hmmm... no, what I mean is how do you, as this unusual person with an 
> HTTP module that isn't a user agent, know where to ask for the different 
> representations in the first place? i.e. how do you know the original URI?
>   
I don't know what is your point.  I got a URI either by some search 
agents, or my friends passes me one, or I read a document whose content 
tells me one.  How do you know yours?
>>>> But, to propose it as a general framework for the Web, it won't 
>>>> work.  At the most fundamental level, we only know three things about 
>>>> the Web -- URI, Representation, Resource.  The concept of metadata is 
>>>> ill-conceived at this level because as data about data, to say 
>>>> metadata implies that we already know something about the resource we 
>>>> tries to access, a piece of knowledge that we don't have.
>>>>     
>>>>         
>>> But even a UA doesn't live in a vacuum. It responds to input, usually 
>>> human, sometimes automated. Either way, it is performing a task and 
>>> will have a variety of parameters. Metadata should make its task easier.
>>>
>>>  
>>>       
>>>> There are a lot of implicit assumptions under the so-called "uniform 
>>>> access to metadata/descriptor" approach.  It either requires the 
>>>> definition of IR or a one-on-one relationship between Resource and 
>>>> Representation.
>>>>     
>>>>         
>>> That depends what the metadata says. If it says "this page is 
>>> generated dynamically to suit a wide variety of devices" that says 
>>> quite the opposite to your conjecture - namely that there are many 
>>> different representations available at the described URI.
>>>   
>>>       
>> If you can describe your scenario without invoking the word "metadata" 
>> or any other similar sort, then you will present a valid case.
>>     
>
> I am unlikely to avoid using the word metadata where it is the correct 
> term.
>
>    This is
>   
>> the very question that I asked at the very first place. Tell me, given a 
>> resource or data A, what is its meta-Resource or its metadata B?  Again 
>> as I have suggested for the definition of IR, let's use Quine's 
>> "ontological commitment" as a criteria to guard ourselves from 
>> hypostasizing or reifying things for a particular theory.
>>
>> Define Data and Metadata in an ontology so that data and metadata is 
>> disjoint because only by which that everyone (both providers and 
>> consumers) can follow it in practice.
>>     
>
> Both data and metadata are data. Both exist independently. 
Yes, as Representation but not Resource.
> The 
> difference is that one tells you something about the other and this 
> relationship is unlikely to be reciprocal. So to give an example, my 
> data can be rendered as an image of a rather dry, reddish landscape. 
> That image exists all by itself. I have a separate bit of data that says 
> there is an image at http://foo which was taken on $date from an 
> altitude of $metres above Mars, by the $orbiter and so on.
>
> Is such an obvious distinction really problematic?
>   
Again, it is about different format.  It is exactly what Conneg does, 
hence making HTTP Link a redundant effort.  And we know from software 
engineering, any redundant code is harmful in the long run.

Xiaoshu
> [..]
>
> Phil.
>
> [1] http://www.w3.org/TR/mobile-bp/#OneWeb
>
Received on Wednesday, 25 February 2009 15:55:57 UTC