Re: Uniform access to metadata: XRD use case. from Phil Archer on 2009-02-25 (www-tag@w3.org from February 2009)

From: Phil Archer <phil@philarcher.org>
Date: Wed, 25 Feb 2009 15:11:48 +0000
To: wangxiao@musc.edu
CC: Eran Hammer-Lahav <eran@hueniverse.com>, Julian Reschke <julian.reschke@gmx.de>, "Patrick.Stickler@nokia.com" <Patrick.Stickler@nokia.com>, "jar@creativecommons.org" <jar@creativecommons.org>, "connolly@w3.org" <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <49A55FB4.8060908@philarcher.org>
Xiaoshu Wang wrote:
> 
> 
> Phil Archer wrote:
>> Xiaoshu Wang wrote:
>>  
>>> The critical flaw of all the proposed approach is that the definition 
>>> of "metadata/descriptor" is ambiguous and hence useless in practice.  
>>> Take the "describedBy" relations for example.  Here I quote from 
>>> Eran's link.
>>>
>>>      The relationship A "describedby" B asserts that resource B
>>>      provides a description of resource A. There are no constraints on
>>>      the format or representation of either A or B, neither are there
>>>      any further constraints on either resource.
>>>
>>> As a URI owner, I don't know what kind of stuff that I should put in 
>>> A or B.
>>>     
>>
>> Yes you do. You know that B has something to say about A. You don't, 
>> however, know what format either is in or anything else. Those details 
>> are handled by other mechanisms, notably the content type. In this link:
>>
>> Link: <foo.bar>; rel="describedby" type="application/thing";
>>
>> You would probably only fetch foo.bar if you had a UA that could 
>> process application/thing. This is a hint - it may be superseded by 
>> the more authoritative headers that come back if you dereference foo.bar)
>>
>>    As a URI client, how should I know when should I get A and when
>>  
>>> B?     
>>
>> Because either:
>>
>>   - you're interested in A for any of the reasons you may be 
>> interested in any resource (you're following a link, it's in a search 
>> result or whatever). Optionally, you can find out more about A by 
>> following the link to B.
>>
>>   - you're collecting URIs of resources that have particular features. 
>> Therefore, you'll look for Bs and then use them to find As.
>>   
> Honestly, do you think that answers any question that I raise?  If B 
> describes A, and if I am interested in A, I am of course interested in 
> B.  What particular features that I am looking to allow me to decide 
> either A or B but not both A and B?
>>   Since I don't know what I might be missing from either A or B, it
>>  
>>> seems to suggest that I must always get both A and B.
>>>     
>>
>> No. As an analogy: if an HTML page links to a stylesheet you can 
>> choose whether to fetch the stylesheet or not in order to render the 
>> page.
>>   
> No.  This is not a reasonable analogy.  When I received a HTML page, (a 
> representation btw), there exists a context that defines the semantics 
> of stylesheet and it, in turn, helps a UA to decide accordingly.  At 
> HTTP level, there is no such context because I know nothing except the 
> URI denotes a Resource.  If you take this HTML page as an analogy, it 
> means that I can move the HTML's stylesheet link into the HTTP layer as 
> the HTTP Link?  Is this a good design?

Well, Opera and Mozilla seem to think so. Take a look at 
http://www.fosi.org/archive/httplinktest/ in either of those browsers 
and you'll see the stylesheet has been imported by following the HTTP 
link (IE doesn't).

And yes, as a vocal supporter of Mark Nottingham's work to reinstate 
HTTP Link: I do think it's a good design to be able to link from one 
thing to another at the the HTTP level. I may get one of any number of 
representations of a resource back from a URI but that's no reason why 
a single description of those representations should not apply to them 
all. This comes back to things like Thematic Consistency [1] and One Web.


>>   Thus, I cannot
>>  
>>> help but wondering why they are not put together at A at the first 
>>> place.
>>>     
>>
>> Because they are often managed by different people, subject to 
>> different production and editorial control etc. Take a content 
>> production workflow. Often there is a relatively large number of 
>> people (journalists, graphic artists etc.) who create the content 
>> which is then subject to review by an editor(ial team). There are many 
>> situations where the latter creates the metadata concerning resources 
>> produced by the former.
>>   
> Again, you assumed a working context.

Like machines, I only ever work in a context. I think to strip away all 
context is so abstract as to be unrealisable.

   This is no different from the
> WebDAV case.  It is invalid as a general mechanism.  Conneg can solve 
> the problem too.  You define a format/service, preferably with a URI, 
> say "b",  for the content of B deploy it under A.  If a user wants to 
> get B-content, which implicitly suggests that they already know "b".  
> Then, they request the "b"-content from A.

You're assuming that the person providing the description is the same as 
the person who created the resource or that they at least have the 
access rights to edit it. I may want to publish something and say "The 
information on this Web site is certified as being true by 
musc.edu/professors/#doe.

>> As a little example:
>>
>> A is the homepage of a bank. It was last updated 2 hours ago.
>> B tells you that A is the homepage of a bank. B was last updated 2 
>> months ago.
>>
>> Current financial crisis notwithstanding, both are accurate, both have 
>> been updated in a time frame that suggests they are actively managed.
>>   
> I don't get it.  Shouldn't A's representation tells me that A is the 
> homepage of a bank? 

Only if you understand the representation - which, presumably a person 
reading a rendered HTML page would do. A machine may well not.

  Why do I need B to tell me the same thing?

They don't say the same thing at all. One says "Put your money here" and 
the other says "that's a bank."

   And yhy
> would I be interested in the update of B?

Because you want to know how up to date the description is as a 
precursor to deciding whether you trust it or not.

>>> The same goes for MGET, how a user knows when to GET and when to 
>>> MGET? PROFOUND is different because when people use it, they have 
>>> already known that the resources is defined by WebDAV.   Hence, these 
>>> kind of ideas only works when the client already have some knowledge 
>>> about A.      
>>
>> I think you're getting into a bit of a tunnel here. How do you know 
>> about anything on the Web? How do you discover anything? All the 
>> mechanisms under discussion have their As and Bs (resources and 
>> descriptions thereof). The current effort is all about trying to find 
>> some uniformity of approach.
>>   
> Yes, my assumption is that you don't know anything about a Resource at 
> the first place.  

Other than you know you're interested in it, surely? i.e. you have some 
prior knowledge.

Thus, given a resource's URI, if I am a specialized
> agent, say RDF agent, I would request something that I can understand, 
> such as RDF/XML, n3 etc.

Makes sense, yes.

  On the other hand, if I am a general agent,
> such as a human, I would (1) conduct implicit Conneg, by request 
> something that I prefer, such as HTML, or other things like image, 
> audio, etc.,  

No, you don't do that because you don't do HTTP - your agent does - 
which immediately puts you into a context. Your UA has certain abilities 
(it can render HTML, play video or whatever).

or (2) conduct transparent Conneg to ask what kind of
> services/content-types that the resource offer so I can choose.  If MIME
> type is URIzed, then a general agent such as a human can follow each of 
> the MIME-URI to understand what is the most appropriate for my need so 
> that I can make my choice accordingly.

OK.

> 
> This is not as what you said "how can you discover anything?".  It is 
> exactly the opposite, it allows you to discover everything.

Hmmm... no, what I mean is how do you, as this unusual person with an 
HTTP module that isn't a user agent, know where to ask for the different 
representations in the first place? i.e. how do you know the original URI?

>>> But, to propose it as a general framework for the Web, it won't 
>>> work.  At the most fundamental level, we only know three things about 
>>> the Web -- URI, Representation, Resource.  The concept of metadata is 
>>> ill-conceived at this level because as data about data, to say 
>>> metadata implies that we already know something about the resource we 
>>> tries to access, a piece of knowledge that we don't have.
>>>     
>>
>> But even a UA doesn't live in a vacuum. It responds to input, usually 
>> human, sometimes automated. Either way, it is performing a task and 
>> will have a variety of parameters. Metadata should make its task easier.
>>
>>  
>>> There are a lot of implicit assumptions under the so-called "uniform 
>>> access to metadata/descriptor" approach.  It either requires the 
>>> definition of IR or a one-on-one relationship between Resource and 
>>> Representation.
>>>     
>>
>> That depends what the metadata says. If it says "this page is 
>> generated dynamically to suit a wide variety of devices" that says 
>> quite the opposite to your conjecture - namely that there are many 
>> different representations available at the described URI.
>>   
> If you can describe your scenario without invoking the word "metadata" 
> or any other similar sort, then you will present a valid case.

I am unlikely to avoid using the word metadata where it is the correct 
term.

   This is
> the very question that I asked at the very first place. Tell me, given a 
> resource or data A, what is its meta-Resource or its metadata B?  Again 
> as I have suggested for the definition of IR, let's use Quine's 
> "ontological commitment" as a criteria to guard ourselves from 
> hypostasizing or reifying things for a particular theory.
> 
> Define Data and Metadata in an ontology so that data and metadata is 
> disjoint because only by which that everyone (both providers and 
> consumers) can follow it in practice.

Both data and metadata are data. Both exist independently. The 
difference is that one tells you something about the other and this 
relationship is unlikely to be reciprocal. So to give an example, my 
data can be rendered as an image of a rather dry, reddish landscape. 
That image exists all by itself. I have a separate bit of data that says 
there is an image at http://foo which was taken on $date from an 
altitude of $metres above Mars, by the $orbiter and so on.

Is such an obvious distinction really problematic?
[..]

Phil.

[1] http://www.w3.org/TR/mobile-bp/#OneWeb
-- 

Phil Archer
http://philarcher.org/

i-sieve technologies                |      W3C Mobile Web Initiative
Making Sense of the Buzz            |      www.w3.org/Mobile
Received on Wednesday, 25 February 2009 15:12:08 UTC