W3C home > Mailing lists > Public > www-tag@w3.org > February 2009

Re: Uniform access to metadata: XRD use case.

From: <Patrick.Stickler@nokia.com>
Date: Wed, 25 Feb 2009 17:42:57 +0100
To: <wangxiao@musc.edu>
CC: <phil@philarcher.org>, <eran@hueniverse.com>, <julian.reschke@gmx.de>, <jar@creativecommons.org>, <connolly@w3.org>, <www-tag@w3.org>
Message-ID: <C5CB41B1.DF8B%patrick.stickler@nokia.com>



On 2009-02-25 18:30, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:

>
>
>
> Patrick.Stickler@nokia.com wrote:
>>
>> On 2009-02-25 13:44, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>>
>>
>>>
>>> Phil Archer wrote:
>>>
>>
>> (apologies to Phil for jumping in with a few comments)
>>
>>
>>>> Xiaoshu Wang wrote:
>>>>
>>>>
>>>>> The critical flaw of all the proposed approach is that the definition of
>>>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>>>> the "describedBy" relations for example.  Here I quote from Eran's link.
>>>>>
>>>>>      The relationship A "describedby" B asserts that resource B
>>>>>      provides a description of resource A. There are no constraints on
>>>>>      the format or representation of either A or B, neither are there
>>>>>      any further constraints on either resource.
>>>>>
>>>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>>>> or B.
>>>>>
>>>>>
>>>> Yes you do. You know that B has something to say about A. You don't,
>>>> however, know what format either is in or anything else. Those details
>>>> are handled by other mechanisms, notably the content type. In this link:
>>>>
>>>> Link: <foo.bar>; rel="describedby" type="application/thing";
>>>>
>>>> You would probably only fetch foo.bar if you had a UA that could process
>>>> application/thing. This is a hint - it may be superseded by the more
>>>> authoritative headers that come back if you dereference foo.bar)
>>>>
>>>>    As a URI client, how should I know when should I get A and when
>>>>
>>>>
>>>>> B?
>>>>>
>>>>>
>>>> Because either:
>>>>
>>>>   - you're interested in A for any of the reasons you may be interested
>>>> in any resource (you're following a link, it's in a search result or
>>>> whatever). Optionally, you can find out more about A by following the
>>>> link to B.
>>>>
>>>>   - you're collecting URIs of resources that have particular features.
>>>> Therefore, you'll look for Bs and then use them to find As.
>>>>
>>>>
>>> Honestly, do you think that answers any question that I raise?  If B
>>> describes A, and if I am interested in A, I am of course interested in
>>> B.  What particular features that I am looking to allow me to decide
>>> either A or B but not both A and B?
>>>
>>
>> Presuming that we are talking about semantic web agents, and not humans...
>>
>> If an agent wants a description of A it just uses MGET to request it. If
>> that description mentions B, and the agent then wants a full description of
>> B, it again just uses MGET to request a description of B.
>>
>> If, based on what the agent has learned about A or B, it wants to access any
>> more general representations of either, it can use GET.
>>
>> And the descriptions of A and B provided by MGET might very well tell the
>> agent exactly what representations are available, and, via additional MGET
>> requests, the descriptions of those representations can be considered,
>> without having to retrieve, parse, and find such descriptions or links to
>> descriptions from the representations themselves.
>>
>> I've made these points repeatedly in the past, but I'll repeat them again:
>>
>> Even if only submitting HEAD requests, the HTTP link method doubles the
>> number of requests needed to achieve the same results, and most critically,
>> does not offer a solution for resources which may not have any
>> representation, or may not have an accessible representation due to various
>> restrictions, and thus is neither as general nor as efficient a protocol
>> solution as URIQA.
>>
>> And the implementational burden for introducing a metadata management and
>> publication layer to an existing web publication environment in order to
>> define metadata, and or associate links to descriptions with resource URIs,
>> and to insert description links into server response headers is equal to or
>> greater than implementing support for a solution based on URIQA, and is less
>> modular.
>>
>> The arguments that a linking approach imposes less implementational burden
>> or disruption to web sites or content publishers than approaches such as
>> URIQA do not bear scrutiny.
>>
>> Objections to a solution based on additional specialized HTTP methods appear
>> to be based primarily on rigid philosophical positions or vested interests
>> and not on the demonstrable technical and practical merits of such
>> solutions.
>>
> "rigid philosophical positions"? We are perhaps all guilty of such
> thing.  Any affairs in the world is, in fact, a matter of battling our
> philosophical positions.  "Vested interest"?  I don't know about yours,
> mine is always about making the Web as pragmatic while lean and simple
> as possible.

Sorry. That comment wasn't actually directed specifically at you.

>
> It is worth noting that I am not disputing if MGET, HTTP Link can work.
> Of course, it can.  I am simply disputing the fact that they are
> functionally redundant to Conneg.  My principle is DRY.  Thus, if you
> can forcefully demonstrate a use case where Conneg cannot do what MGET
> will do, I am all ears and my interest will be vested in yours.

It's not simply about what approach can be made to work, but what approach
is the most optimal, taking into a broad range of issues such as
implementational effort (both for software implementation and content
creation and management), clarity (folks won't adopt solutions that are hard
to understand), modularity and layer separation (can you change your
solution for serving representations without having to retool your solution
for serving descriptions) etc etc.

Probably should take this offline (though I encourage you to take some time
to digest and think about some of these issues independently). I doubt
further discussion will be beneficial to the broad distribution this thread
has had up to this point.

Patrick



>
> Xiaoshu
>>>>   Since I don't know what I might be missing from either A or B, it
>>>>
>>>>
>>>>> seems to suggest that I must always get both A and B.
>>>>>
>>>>>
>>>> No. As an analogy: if an HTML page links to a stylesheet you can choose
>>>> whether to fetch the stylesheet or not in order to render the page.
>>>>
>>>>
>>> No.  This is not a reasonable analogy.  When I received a HTML page, (a
>>> representation btw), there exists a context that defines the semantics
>>> of stylesheet and it, in turn, helps a UA to decide accordingly.  At
>>> HTTP level, there is no such context because I know nothing except the
>>> URI denotes a Resource.  If you take this HTML page as an analogy, it
>>> means that I can move the HTML's stylesheet link into the HTTP layer as
>>> the HTTP Link?  Is this a good design?
>>>
>>
>> Relying on a link at either layer is a suboptimal design, because both
>> depend on the existance of a representation other than its formal
>> description in order to access its formal description using web protocols.
>>
>> Not all resources of interest to the semantic layer will have traditional
>> web representations to associate a link to its formal description (unless of
>> course, in the case of the link approach, the solution requires it).
>>
>>
>>
>>>>   Thus, I cannot
>>>>
>>>>
>>>>> help but wondering why they are not put together at A at the first place.
>>>>>
>>>>>
>>>> Because they are often managed by different people, subject to different
>>>> production and editorial control etc. Take a content production
>>>> workflow. Often there is a relatively large number of people
>>>> (journalists, graphic artists etc.) who create the content which is then
>>>> subject to review by an editor(ial team). There are many situations
>>>> where the latter creates the metadata concerning resources produced by
>>>> the former.
>>>>
>>>>
>>> Again, you assumed a working context.  This is no different from the
>>> WebDAV case.  It is invalid as a general mechanism.  Conneg can solve
>>> the problem too.  You define a format/service, preferably with a URI,
>>> say "b",  for the content of B deploy it under A.  If a user wants to
>>> get B-content, which implicitly suggests that they already know "b".
>>> Then, they request the "b"-content from A.
>>>
>>>> As a little example:
>>>>
>>>> A is the homepage of a bank. It was last updated 2 hours ago.
>>>> B tells you that A is the homepage of a bank. B was last updated 2
>>>> months ago.
>>>>
>>>> Current financial crisis notwithstanding, both are accurate, both have
>>>> been updated in a time frame that suggests they are actively managed.
>>>>
>>>>
>>> I don't get it.  Shouldn't A's representation tells me that A is the
>>> homepage of a bank?
>>>
>>
>> Not necessarily. And even if you, as a human being, might be able to deduce
>> from the representation of A that A is the homepage of a bank, a semantic
>> web agent probably won't (and without explicit RDF statements to that
>> affect, you couldn't be sure that your deductions about A based on
>> examination of the representation are correct).
>>
>>
>>> Why do I need B to tell me the same thing?  And yhy
>>> would I be interested in the update of B?
>>>
>>
>> Because A may not tell anything about the resource in a manner that is
>> meaningful to a semantic web agent, and B is (presumably) a formal
>> description of A which corresponds to an RDF graph which is meaningful to a
>> semantic web agent.
>>
>> If B is not such a formal description, then probably neither A or B are
>> useful at the semantic web layer.
>>
>>
>>
>>>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>>>> PROFOUND is different because when people use it, they have already
>>>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>>>> ideas only works when the client already have some knowledge about A.
>>>>>
>>>>>
>>>> I think you're getting into a bit of a tunnel here. How do you know
>>>> about anything on the Web? How do you discover anything? All the
>>>> mechanisms under discussion have their As and Bs (resources and
>>>> descriptions thereof). The current effort is all about trying to find
>>>> some uniformity of approach.
>>>>
>>>>
>>> Yes, my assumption is that you don't know anything about a Resource at
>>> the first place.  Thus, given a resource's URI, if I am a specialized
>>> agent, say RDF agent, I would request something that I can understand,
>>> such as RDF/XML, n3 etc.
>>>
>>
>> Right. A semantic web agent would request a representation of a description
>> of the resource of interest, corresponding to an RDF graph, and furthermore
>> could use content negotiation to indicate which graph serialization
>> encodings are acceptable and thus possibly affect which variant
>> representation of the description is provided.
>>
>>
>>> On the other hand, if I am a general agent,
>>> such as a human, I would (1) conduct implicit Conneg, by request
>>> something that I prefer, such as HTML, or other things like image,
>>> audio, etc.,  or (2) conduct transparent Conneg to ask what kind of
>>> services/content-types that the resource offer so I can choose.  If MIME
>>> type is URIzed, then a general agent such as a human can follow each of
>>> the MIME-URI to understand what is the most appropriate for my need so
>>> that I can make my choice accordingly.
>>>
>>
>> Ideally, a human wouldn't have to be concerned about MIME types or content
>> negotiation, but rather the software agent (e.g. a web browser) would hide
>> such details behind an human optimized interface.
>>
>>
>>> This is not as what you said "how can you discover anything?".  It is
>>> exactly the opposite, it allows you to discover everything.
>>>
>>
>> The question is how a semantic web agent requests a consumable description
>> of a resource in the most optimal manner.
>>
>> How humans access and consume representations of resources has been pretty
>> much sorted out for quite some time.
>>
>>
>>>>> But, to propose it as a general framework for the Web, it won't work.
>>>>> At the most fundamental level, we only know three things about the Web
>>>>> -- URI, Representation, Resource.  The concept of metadata is
>>>>> ill-conceived at this level because as data about data, to say metadata
>>>>> implies that we already know something about the resource we tries to
>>>>> access, a piece of knowledge that we don't have.
>>>>>
>>>>>
>>>> But even a UA doesn't live in a vacuum. It responds to input, usually
>>>> human, sometimes automated. Either way, it is performing a task and will
>>>> have a variety of parameters. Metadata should make its task easier.
>>>>
>>>>
>>>>
>>>>> There are a lot of implicit assumptions under the so-called "uniform
>>>>> access to metadata/descriptor" approach.  It either requires the
>>>>> definition of IR or a one-on-one relationship between Resource and
>>>>> Representation.
>>>>>
>>>>>
>>>> That depends what the metadata says. If it says "this page is generated
>>>> dynamically to suit a wide variety of devices" that says quite the
>>>> opposite to your conjecture - namely that there are many different
>>>> representations available at the described URI.
>>>>
>>>>
>>> If you can describe your scenario without invoking the word "metadata"
>>> or any other similar sort, then you will present a valid case.  This is
>>> the very question that I asked at the very first place. Tell me, given a
>>> resource or data A, what is its meta-Resource or its metadata B?  Again
>>> as I have suggested for the definition of IR, let's use Quine's
>>> "ontological commitment" as a criteria to guard ourselves from
>>> hypostasizing or reifying things for a particular theory.
>>>
>>> Define Data and Metadata in an ontology so that data and metadata is
>>> disjoint because only by which that everyone (both providers and
>>> consumers) can follow it in practice.
>>>
>>
>> One agent's data is another agent's metadata.
>>
>> RDF graphs are data to the semantic web layer, but can constitute metadata
>> at the web layer.
>>
>> RDF graphs, however, can be serialized and accessible as representations at
>> the web layer, and such representations are data to the web layer.
>>
>> Whether it is data or metadata depends on the layer at which it is
>> interpreted/consumed and the purpose of the agent.
>>
>> Patrick
>>
>>
>>
>>
>>> Xiaoshu
>>>
>>>> Others, more qualified than me, have answered your remaining issues.
>>>>
>>>> Phil.
>>>>
>>>>    As the former implies that non-IR cannot have a
>>>>
>>>>
>>>>> representation, it makes the "descriptor/metadata" necessary.  The knock
>>>>> on this assumption is that the definition of IR is impossible to work
>>>>> with.
>>>>>
>>>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>>>> might be something as "legacy representation" but there should NOT be
>>>>> such thing as "legacy resource" because the latter implies that the
>>>>> Resource is closed and no more semantics will be added.
>>>>> But the so-called "metadata/descriptor" problems can be solved by using
>>>>> HTTP Content Negotiation, making any other proposal a redundant one. The
>>>>> actual issue, as I have discussed in [1], is about the incomplete syntax
>>>>> of the URI specs, which  currently does not have a syntactic notation
>>>>> the other two foundation objects in the Web, i.e., URI and
>>>>> Representation.  Once we supplement URI spec with those syntactic sugar,
>>>>> such as the one I proposed in [2], then, we can have a uniform approach
>>>>> to (1) describe URI along with standard resources and (2) to
>>>>> systematically discover the possible representation types, i.e.,
>>>>> Content-Type/MIME types, associated with a Resource (either URI or
>>>>> standard Resource). As a particular content-type is equivalent of a
>>>>> particular *service*, hence, the approach in effect establishes a
>>>>> uniformed approach to service discovery.
>>>>> What is required is to define Content-Type in URI.  Once we have these,
>>>>> not only Data/Resource are linked but DataType/Service.  The best of
>>>>> all, it works within the conceptualizations defined in AWWW, and does
>>>>> not require any other ambiguous conceptualization, such as, IR,
>>>>> metadata, and description, etc.
>>>>>
>>>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>>>
>>>>> Xiaoshu
>>>>>
>>>>> Eran Hammer-Lahav wrote:
>>>>>
>>>>>
>>>>>> Both of which are included in my analysis [1] for the discovery proposal.
>>>>>>
>>>>>> EHL
>>>>>>
>>>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>>>> To: Patrick.Stickler@nokia.com
>>>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org; www-
>>>>>>> tag@w3.org
>>>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>>>
>>>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> ...
>>>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> MGET/MPUT/etc.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>>>
>>>>>>> BR, Julian
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>
>>
>>
>>
Received on Wednesday, 25 February 2009 16:43:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:12 GMT