Re: Uniform access to metadata: XRD use case. from Xiaoshu Wang on 2009-02-25 (www-tag@w3.org from February 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Wed, 25 Feb 2009 16:30:27 +0000
To: "Patrick.Stickler@nokia.com" <Patrick.Stickler@nokia.com>
CC: "phil@philarcher.org" <phil@philarcher.org>, "eran@hueniverse.com" <eran@hueniverse.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "jar@creativecommons.org" <jar@creativecommons.org>, "connolly@w3.org" <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <49A57223.2030307@musc.edu>
Patrick.Stickler@nokia.com wrote:
>
> On 2009-02-25 13:44, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>
>   
>>
>> Phil Archer wrote:
>>     
>
> (apologies to Phil for jumping in with a few comments)
>
>   
>>> Xiaoshu Wang wrote:
>>>
>>>       
>>>> The critical flaw of all the proposed approach is that the definition of
>>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>>> the "describedBy" relations for example.  Here I quote from Eran's link.
>>>>
>>>>      The relationship A "describedby" B asserts that resource B
>>>>      provides a description of resource A. There are no constraints on
>>>>      the format or representation of either A or B, neither are there
>>>>      any further constraints on either resource.
>>>>
>>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>>> or B.
>>>>
>>>>         
>>> Yes you do. You know that B has something to say about A. You don't,
>>> however, know what format either is in or anything else. Those details
>>> are handled by other mechanisms, notably the content type. In this link:
>>>
>>> Link: <foo.bar>; rel="describedby" type="application/thing";
>>>
>>> You would probably only fetch foo.bar if you had a UA that could process
>>> application/thing. This is a hint - it may be superseded by the more
>>> authoritative headers that come back if you dereference foo.bar)
>>>
>>>    As a URI client, how should I know when should I get A and when
>>>
>>>       
>>>> B?
>>>>
>>>>         
>>> Because either:
>>>
>>>   - you're interested in A for any of the reasons you may be interested
>>> in any resource (you're following a link, it's in a search result or
>>> whatever). Optionally, you can find out more about A by following the
>>> link to B.
>>>
>>>   - you're collecting URIs of resources that have particular features.
>>> Therefore, you'll look for Bs and then use them to find As.
>>>
>>>       
>> Honestly, do you think that answers any question that I raise?  If B
>> describes A, and if I am interested in A, I am of course interested in
>> B.  What particular features that I am looking to allow me to decide
>> either A or B but not both A and B?
>>     
>
> Presuming that we are talking about semantic web agents, and not humans...
>
> If an agent wants a description of A it just uses MGET to request it. If
> that description mentions B, and the agent then wants a full description of
> B, it again just uses MGET to request a description of B.
>
> If, based on what the agent has learned about A or B, it wants to access any
> more general representations of either, it can use GET.
>
> And the descriptions of A and B provided by MGET might very well tell the
> agent exactly what representations are available, and, via additional MGET
> requests, the descriptions of those representations can be considered,
> without having to retrieve, parse, and find such descriptions or links to
> descriptions from the representations themselves.
>
> I've made these points repeatedly in the past, but I'll repeat them again:
>
> Even if only submitting HEAD requests, the HTTP link method doubles the
> number of requests needed to achieve the same results, and most critically,
> does not offer a solution for resources which may not have any
> representation, or may not have an accessible representation due to various
> restrictions, and thus is neither as general nor as efficient a protocol
> solution as URIQA.
>
> And the implementational burden for introducing a metadata management and
> publication layer to an existing web publication environment in order to
> define metadata, and or associate links to descriptions with resource URIs,
> and to insert description links into server response headers is equal to or
> greater than implementing support for a solution based on URIQA, and is less
> modular.
>
> The arguments that a linking approach imposes less implementational burden
> or disruption to web sites or content publishers than approaches such as
> URIQA do not bear scrutiny.
>
> Objections to a solution based on additional specialized HTTP methods appear
> to be based primarily on rigid philosophical positions or vested interests
> and not on the demonstrable technical and practical merits of such
> solutions.
>   
"rigid philosophical positions"? We are perhaps all guilty of such 
thing.  Any affairs in the world is, in fact, a matter of battling our 
philosophical positions.  "Vested interest"?  I don't know about yours, 
mine is always about making the Web as pragmatic while lean and simple 
as possible.

It is worth noting that I am not disputing if MGET, HTTP Link can work.  
Of course, it can.  I am simply disputing the fact that they are 
functionally redundant to Conneg.  My principle is DRY.  Thus, if you 
can forcefully demonstrate a use case where Conneg cannot do what MGET 
will do, I am all ears and my interest will be vested in yours.

Xiaoshu
>>>   Since I don't know what I might be missing from either A or B, it
>>>
>>>       
>>>> seems to suggest that I must always get both A and B.
>>>>
>>>>         
>>> No. As an analogy: if an HTML page links to a stylesheet you can choose
>>> whether to fetch the stylesheet or not in order to render the page.
>>>
>>>       
>> No.  This is not a reasonable analogy.  When I received a HTML page, (a
>> representation btw), there exists a context that defines the semantics
>> of stylesheet and it, in turn, helps a UA to decide accordingly.  At
>> HTTP level, there is no such context because I know nothing except the
>> URI denotes a Resource.  If you take this HTML page as an analogy, it
>> means that I can move the HTML's stylesheet link into the HTTP layer as
>> the HTTP Link?  Is this a good design?
>>     
>
> Relying on a link at either layer is a suboptimal design, because both
> depend on the existance of a representation other than its formal
> description in order to access its formal description using web protocols.
>
> Not all resources of interest to the semantic layer will have traditional
> web representations to associate a link to its formal description (unless of
> course, in the case of the link approach, the solution requires it).
>
>
>   
>>>   Thus, I cannot
>>>
>>>       
>>>> help but wondering why they are not put together at A at the first place.
>>>>
>>>>         
>>> Because they are often managed by different people, subject to different
>>> production and editorial control etc. Take a content production
>>> workflow. Often there is a relatively large number of people
>>> (journalists, graphic artists etc.) who create the content which is then
>>> subject to review by an editor(ial team). There are many situations
>>> where the latter creates the metadata concerning resources produced by
>>> the former.
>>>
>>>       
>> Again, you assumed a working context.  This is no different from the
>> WebDAV case.  It is invalid as a general mechanism.  Conneg can solve
>> the problem too.  You define a format/service, preferably with a URI,
>> say "b",  for the content of B deploy it under A.  If a user wants to
>> get B-content, which implicitly suggests that they already know "b".
>> Then, they request the "b"-content from A.
>>     
>>> As a little example:
>>>
>>> A is the homepage of a bank. It was last updated 2 hours ago.
>>> B tells you that A is the homepage of a bank. B was last updated 2
>>> months ago.
>>>
>>> Current financial crisis notwithstanding, both are accurate, both have
>>> been updated in a time frame that suggests they are actively managed.
>>>
>>>       
>> I don't get it.  Shouldn't A's representation tells me that A is the
>> homepage of a bank?
>>     
>
> Not necessarily. And even if you, as a human being, might be able to deduce
> from the representation of A that A is the homepage of a bank, a semantic
> web agent probably won't (and without explicit RDF statements to that
> affect, you couldn't be sure that your deductions about A based on
> examination of the representation are correct).
>
>   
>> Why do I need B to tell me the same thing?  And yhy
>> would I be interested in the update of B?
>>     
>
> Because A may not tell anything about the resource in a manner that is
> meaningful to a semantic web agent, and B is (presumably) a formal
> description of A which corresponds to an RDF graph which is meaningful to a
> semantic web agent.
>
> If B is not such a formal description, then probably neither A or B are
> useful at the semantic web layer.
>
>
>   
>>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>>> PROFOUND is different because when people use it, they have already
>>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>>> ideas only works when the client already have some knowledge about A.
>>>>
>>>>         
>>> I think you're getting into a bit of a tunnel here. How do you know
>>> about anything on the Web? How do you discover anything? All the
>>> mechanisms under discussion have their As and Bs (resources and
>>> descriptions thereof). The current effort is all about trying to find
>>> some uniformity of approach.
>>>
>>>       
>> Yes, my assumption is that you don't know anything about a Resource at
>> the first place.  Thus, given a resource's URI, if I am a specialized
>> agent, say RDF agent, I would request something that I can understand,
>> such as RDF/XML, n3 etc.
>>     
>
> Right. A semantic web agent would request a representation of a description
> of the resource of interest, corresponding to an RDF graph, and furthermore
> could use content negotiation to indicate which graph serialization
> encodings are acceptable and thus possibly affect which variant
> representation of the description is provided.
>
>   
>> On the other hand, if I am a general agent,
>> such as a human, I would (1) conduct implicit Conneg, by request
>> something that I prefer, such as HTML, or other things like image,
>> audio, etc.,  or (2) conduct transparent Conneg to ask what kind of
>> services/content-types that the resource offer so I can choose.  If MIME
>> type is URIzed, then a general agent such as a human can follow each of
>> the MIME-URI to understand what is the most appropriate for my need so
>> that I can make my choice accordingly.
>>     
>
> Ideally, a human wouldn't have to be concerned about MIME types or content
> negotiation, but rather the software agent (e.g. a web browser) would hide
> such details behind an human optimized interface.
>
>   
>> This is not as what you said "how can you discover anything?".  It is
>> exactly the opposite, it allows you to discover everything.
>>     
>
> The question is how a semantic web agent requests a consumable description
> of a resource in the most optimal manner.
>
> How humans access and consume representations of resources has been pretty
> much sorted out for quite some time.
>
>   
>>>> But, to propose it as a general framework for the Web, it won't work.
>>>> At the most fundamental level, we only know three things about the Web
>>>> -- URI, Representation, Resource.  The concept of metadata is
>>>> ill-conceived at this level because as data about data, to say metadata
>>>> implies that we already know something about the resource we tries to
>>>> access, a piece of knowledge that we don't have.
>>>>
>>>>         
>>> But even a UA doesn't live in a vacuum. It responds to input, usually
>>> human, sometimes automated. Either way, it is performing a task and will
>>> have a variety of parameters. Metadata should make its task easier.
>>>
>>>
>>>       
>>>> There are a lot of implicit assumptions under the so-called "uniform
>>>> access to metadata/descriptor" approach.  It either requires the
>>>> definition of IR or a one-on-one relationship between Resource and
>>>> Representation.
>>>>
>>>>         
>>> That depends what the metadata says. If it says "this page is generated
>>> dynamically to suit a wide variety of devices" that says quite the
>>> opposite to your conjecture - namely that there are many different
>>> representations available at the described URI.
>>>
>>>       
>> If you can describe your scenario without invoking the word "metadata"
>> or any other similar sort, then you will present a valid case.  This is
>> the very question that I asked at the very first place. Tell me, given a
>> resource or data A, what is its meta-Resource or its metadata B?  Again
>> as I have suggested for the definition of IR, let's use Quine's
>> "ontological commitment" as a criteria to guard ourselves from
>> hypostasizing or reifying things for a particular theory.
>>
>> Define Data and Metadata in an ontology so that data and metadata is
>> disjoint because only by which that everyone (both providers and
>> consumers) can follow it in practice.
>>     
>
> One agent's data is another agent's metadata.
>
> RDF graphs are data to the semantic web layer, but can constitute metadata
> at the web layer.
>
> RDF graphs, however, can be serialized and accessible as representations at
> the web layer, and such representations are data to the web layer.
>
> Whether it is data or metadata depends on the layer at which it is
> interpreted/consumed and the purpose of the agent.
>
> Patrick
>
>
>
>   
>> Xiaoshu
>>     
>>> Others, more qualified than me, have answered your remaining issues.
>>>
>>> Phil.
>>>
>>>    As the former implies that non-IR cannot have a
>>>
>>>       
>>>> representation, it makes the "descriptor/metadata" necessary.  The knock
>>>> on this assumption is that the definition of IR is impossible to work with.
>>>>
>>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>>> might be something as "legacy representation" but there should NOT be
>>>> such thing as "legacy resource" because the latter implies that the
>>>> Resource is closed and no more semantics will be added.
>>>> But the so-called "metadata/descriptor" problems can be solved by using
>>>> HTTP Content Negotiation, making any other proposal a redundant one. The
>>>> actual issue, as I have discussed in [1], is about the incomplete syntax
>>>> of the URI specs, which  currently does not have a syntactic notation
>>>> the other two foundation objects in the Web, i.e., URI and
>>>> Representation.  Once we supplement URI spec with those syntactic sugar,
>>>> such as the one I proposed in [2], then, we can have a uniform approach
>>>> to (1) describe URI along with standard resources and (2) to
>>>> systematically discover the possible representation types, i.e.,
>>>> Content-Type/MIME types, associated with a Resource (either URI or
>>>> standard Resource). As a particular content-type is equivalent of a
>>>> particular *service*, hence, the approach in effect establishes a
>>>> uniformed approach to service discovery.
>>>> What is required is to define Content-Type in URI.  Once we have these,
>>>> not only Data/Resource are linked but DataType/Service.  The best of
>>>> all, it works within the conceptualizations defined in AWWW, and does
>>>> not require any other ambiguous conceptualization, such as, IR,
>>>> metadata, and description, etc.
>>>>
>>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>>
>>>> Xiaoshu
>>>>
>>>> Eran Hammer-Lahav wrote:
>>>>
>>>>         
>>>>> Both of which are included in my analysis [1] for the discovery proposal.
>>>>>
>>>>> EHL
>>>>>
>>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>>> To: Patrick.Stickler@nokia.com
>>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org; www-
>>>>>> tag@w3.org
>>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>>
>>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ...
>>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> MGET/MPUT/etc.
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>>
>>>>>> BR, Julian
>>>>>>
>>>>>>
>>>>>>             
>>>>>           
>>>       
>
>
>
Received on Wednesday, 25 February 2009 16:31:20 UTC