Re: Uniform access to metadata: XRD use case. from Patrick.Stickler@nokia.com on 2009-02-25 (www-tag@w3.org from February 2009)

From: <Patrick.Stickler@nokia.com>
Date: Wed, 25 Feb 2009 17:01:19 +0100
To: <wangxiao@musc.edu>, <phil@philarcher.org>
CC: <eran@hueniverse.com>, <julian.reschke@gmx.de>, <jar@creativecommons.org>, <connolly@w3.org>, <www-tag@w3.org>
Message-ID: <C5CB37EF.DF84%patrick.stickler@nokia.com>
On 2009-02-25 13:44, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:

>
>
>
> Phil Archer wrote:

(apologies to Phil for jumping in with a few comments)

>> Xiaoshu Wang wrote:
>>
>>> The critical flaw of all the proposed approach is that the definition of
>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>> the "describedBy" relations for example.  Here I quote from Eran's link.
>>>
>>>      The relationship A "describedby" B asserts that resource B
>>>      provides a description of resource A. There are no constraints on
>>>      the format or representation of either A or B, neither are there
>>>      any further constraints on either resource.
>>>
>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>> or B.
>>>
>>
>> Yes you do. You know that B has something to say about A. You don't,
>> however, know what format either is in or anything else. Those details
>> are handled by other mechanisms, notably the content type. In this link:
>>
>> Link: <foo.bar>; rel="describedby" type="application/thing";
>>
>> You would probably only fetch foo.bar if you had a UA that could process
>> application/thing. This is a hint - it may be superseded by the more
>> authoritative headers that come back if you dereference foo.bar)
>>
>>    As a URI client, how should I know when should I get A and when
>>
>>> B?
>>>
>>
>> Because either:
>>
>>   - you're interested in A for any of the reasons you may be interested
>> in any resource (you're following a link, it's in a search result or
>> whatever). Optionally, you can find out more about A by following the
>> link to B.
>>
>>   - you're collecting URIs of resources that have particular features.
>> Therefore, you'll look for Bs and then use them to find As.
>>
> Honestly, do you think that answers any question that I raise?  If B
> describes A, and if I am interested in A, I am of course interested in
> B.  What particular features that I am looking to allow me to decide
> either A or B but not both A and B?

Presuming that we are talking about semantic web agents, and not humans...

If an agent wants a description of A it just uses MGET to request it. If
that description mentions B, and the agent then wants a full description of
B, it again just uses MGET to request a description of B.

If, based on what the agent has learned about A or B, it wants to access any
more general representations of either, it can use GET.

And the descriptions of A and B provided by MGET might very well tell the
agent exactly what representations are available, and, via additional MGET
requests, the descriptions of those representations can be considered,
without having to retrieve, parse, and find such descriptions or links to
descriptions from the representations themselves.

I've made these points repeatedly in the past, but I'll repeat them again:

Even if only submitting HEAD requests, the HTTP link method doubles the
number of requests needed to achieve the same results, and most critically,
does not offer a solution for resources which may not have any
representation, or may not have an accessible representation due to various
restrictions, and thus is neither as general nor as efficient a protocol
solution as URIQA.

And the implementational burden for introducing a metadata management and
publication layer to an existing web publication environment in order to
define metadata, and or associate links to descriptions with resource URIs,
and to insert description links into server response headers is equal to or
greater than implementing support for a solution based on URIQA, and is less
modular.

The arguments that a linking approach imposes less implementational burden
or disruption to web sites or content publishers than approaches such as
URIQA do not bear scrutiny.

Objections to a solution based on additional specialized HTTP methods appear
to be based primarily on rigid philosophical positions or vested interests
and not on the demonstrable technical and practical merits of such
solutions.

>>   Since I don't know what I might be missing from either A or B, it
>>
>>> seems to suggest that I must always get both A and B.
>>>
>>
>> No. As an analogy: if an HTML page links to a stylesheet you can choose
>> whether to fetch the stylesheet or not in order to render the page.
>>
> No.  This is not a reasonable analogy.  When I received a HTML page, (a
> representation btw), there exists a context that defines the semantics
> of stylesheet and it, in turn, helps a UA to decide accordingly.  At
> HTTP level, there is no such context because I know nothing except the
> URI denotes a Resource.  If you take this HTML page as an analogy, it
> means that I can move the HTML's stylesheet link into the HTTP layer as
> the HTTP Link?  Is this a good design?

Relying on a link at either layer is a suboptimal design, because both
depend on the existance of a representation other than its formal
description in order to access its formal description using web protocols.

Not all resources of interest to the semantic layer will have traditional
web representations to associate a link to its formal description (unless of
course, in the case of the link approach, the solution requires it).


>>   Thus, I cannot
>>
>>> help but wondering why they are not put together at A at the first place.
>>>
>>
>> Because they are often managed by different people, subject to different
>> production and editorial control etc. Take a content production
>> workflow. Often there is a relatively large number of people
>> (journalists, graphic artists etc.) who create the content which is then
>> subject to review by an editor(ial team). There are many situations
>> where the latter creates the metadata concerning resources produced by
>> the former.
>>
> Again, you assumed a working context.  This is no different from the
> WebDAV case.  It is invalid as a general mechanism.  Conneg can solve
> the problem too.  You define a format/service, preferably with a URI,
> say "b",  for the content of B deploy it under A.  If a user wants to
> get B-content, which implicitly suggests that they already know "b".
> Then, they request the "b"-content from A.
>> As a little example:
>>
>> A is the homepage of a bank. It was last updated 2 hours ago.
>> B tells you that A is the homepage of a bank. B was last updated 2
>> months ago.
>>
>> Current financial crisis notwithstanding, both are accurate, both have
>> been updated in a time frame that suggests they are actively managed.
>>
> I don't get it.  Shouldn't A's representation tells me that A is the
> homepage of a bank?

Not necessarily. And even if you, as a human being, might be able to deduce
from the representation of A that A is the homepage of a bank, a semantic
web agent probably won't (and without explicit RDF statements to that
affect, you couldn't be sure that your deductions about A based on
examination of the representation are correct).

> Why do I need B to tell me the same thing?  And yhy
> would I be interested in the update of B?

Because A may not tell anything about the resource in a manner that is
meaningful to a semantic web agent, and B is (presumably) a formal
description of A which corresponds to an RDF graph which is meaningful to a
semantic web agent.

If B is not such a formal description, then probably neither A or B are
useful at the semantic web layer.


>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>> PROFOUND is different because when people use it, they have already
>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>> ideas only works when the client already have some knowledge about A.
>>>
>>
>> I think you're getting into a bit of a tunnel here. How do you know
>> about anything on the Web? How do you discover anything? All the
>> mechanisms under discussion have their As and Bs (resources and
>> descriptions thereof). The current effort is all about trying to find
>> some uniformity of approach.
>>
> Yes, my assumption is that you don't know anything about a Resource at
> the first place.  Thus, given a resource's URI, if I am a specialized
> agent, say RDF agent, I would request something that I can understand,
> such as RDF/XML, n3 etc.

Right. A semantic web agent would request a representation of a description
of the resource of interest, corresponding to an RDF graph, and furthermore
could use content negotiation to indicate which graph serialization
encodings are acceptable and thus possibly affect which variant
representation of the description is provided.

> On the other hand, if I am a general agent,
> such as a human, I would (1) conduct implicit Conneg, by request
> something that I prefer, such as HTML, or other things like image,
> audio, etc.,  or (2) conduct transparent Conneg to ask what kind of
> services/content-types that the resource offer so I can choose.  If MIME
> type is URIzed, then a general agent such as a human can follow each of
> the MIME-URI to understand what is the most appropriate for my need so
> that I can make my choice accordingly.

Ideally, a human wouldn't have to be concerned about MIME types or content
negotiation, but rather the software agent (e.g. a web browser) would hide
such details behind an human optimized interface.

>
> This is not as what you said "how can you discover anything?".  It is
> exactly the opposite, it allows you to discover everything.

The question is how a semantic web agent requests a consumable description
of a resource in the most optimal manner.

How humans access and consume representations of resources has been pretty
much sorted out for quite some time.

>>> But, to propose it as a general framework for the Web, it won't work.
>>> At the most fundamental level, we only know three things about the Web
>>> -- URI, Representation, Resource.  The concept of metadata is
>>> ill-conceived at this level because as data about data, to say metadata
>>> implies that we already know something about the resource we tries to
>>> access, a piece of knowledge that we don't have.
>>>
>>
>> But even a UA doesn't live in a vacuum. It responds to input, usually
>> human, sometimes automated. Either way, it is performing a task and will
>> have a variety of parameters. Metadata should make its task easier.
>>
>>
>>> There are a lot of implicit assumptions under the so-called "uniform
>>> access to metadata/descriptor" approach.  It either requires the
>>> definition of IR or a one-on-one relationship between Resource and
>>> Representation.
>>>
>>
>> That depends what the metadata says. If it says "this page is generated
>> dynamically to suit a wide variety of devices" that says quite the
>> opposite to your conjecture - namely that there are many different
>> representations available at the described URI.
>>
> If you can describe your scenario without invoking the word "metadata"
> or any other similar sort, then you will present a valid case.  This is
> the very question that I asked at the very first place. Tell me, given a
> resource or data A, what is its meta-Resource or its metadata B?  Again
> as I have suggested for the definition of IR, let's use Quine's
> "ontological commitment" as a criteria to guard ourselves from
> hypostasizing or reifying things for a particular theory.
>
> Define Data and Metadata in an ontology so that data and metadata is
> disjoint because only by which that everyone (both providers and
> consumers) can follow it in practice.

One agent's data is another agent's metadata.

RDF graphs are data to the semantic web layer, but can constitute metadata
at the web layer.

RDF graphs, however, can be serialized and accessible as representations at
the web layer, and such representations are data to the web layer.

Whether it is data or metadata depends on the layer at which it is
interpreted/consumed and the purpose of the agent.

Patrick



>
> Xiaoshu
>> Others, more qualified than me, have answered your remaining issues.
>>
>> Phil.
>>
>>    As the former implies that non-IR cannot have a
>>
>>> representation, it makes the "descriptor/metadata" necessary.  The knock
>>> on this assumption is that the definition of IR is impossible to work with.
>>>
>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>> might be something as "legacy representation" but there should NOT be
>>> such thing as "legacy resource" because the latter implies that the
>>> Resource is closed and no more semantics will be added.
>>> But the so-called "metadata/descriptor" problems can be solved by using
>>> HTTP Content Negotiation, making any other proposal a redundant one. The
>>> actual issue, as I have discussed in [1], is about the incomplete syntax
>>> of the URI specs, which  currently does not have a syntactic notation
>>> the other two foundation objects in the Web, i.e., URI and
>>> Representation.  Once we supplement URI spec with those syntactic sugar,
>>> such as the one I proposed in [2], then, we can have a uniform approach
>>> to (1) describe URI along with standard resources and (2) to
>>> systematically discover the possible representation types, i.e.,
>>> Content-Type/MIME types, associated with a Resource (either URI or
>>> standard Resource). As a particular content-type is equivalent of a
>>> particular *service*, hence, the approach in effect establishes a
>>> uniformed approach to service discovery.
>>> What is required is to define Content-Type in URI.  Once we have these,
>>> not only Data/Resource are linked but DataType/Service.  The best of
>>> all, it works within the conceptualizations defined in AWWW, and does
>>> not require any other ambiguous conceptualization, such as, IR,
>>> metadata, and description, etc.
>>>
>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>
>>> Xiaoshu
>>>
>>> Eran Hammer-Lahav wrote:
>>>
>>>> Both of which are included in my analysis [1] for the discovery proposal.
>>>>
>>>> EHL
>>>>
>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>> To: Patrick.Stickler@nokia.com
>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org; www-
>>>>> tag@w3.org
>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>
>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>
>>>>>
>>>>>> ...
>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>
>>>>>>
>>>>> MGET/MPUT/etc.
>>>>>
>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>
>>>>> BR, Julian
>>>>>
>>>>>
>>>>
>>>>
>>>
>>
>>
Received on Wednesday, 25 February 2009 16:00:05 UTC