Re: Uniform access to metadata: XRD use case. from Patrick.Stickler@nokia.com on 2009-03-02 (www-tag@w3.org from March 2009)

From: <Patrick.Stickler@nokia.com>
Date: Mon, 2 Mar 2009 11:15:59 +0100
To: <wangxiao@musc.edu>, <eran@hueniverse.com>
CC: <julian.reschke@gmx.de>, <jar@creativecommons.org>, <connolly@w3.org>, <www-tag@w3.org>
Message-ID: <C5D17E7F.E20C%patrick.stickler@nokia.com>
I don't agree that content negotiation is a proper solution to this problem,
but Xiaoshu asks some valid questions below, and also makes some valid
points (most of which I myself have made in the past)...

On 2009-03-02 01:52, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:

>
>
>
> Eran Hammer-Lahav wrote:
>> The reason why your position on links is pointless is because you are trying
>> to use a framework - a tool - as the end and not the mean. Your entire
>> argument is equal to someone walking over to the guy who invented the first
>> axe and told him it has a critical flaw because by itself, it wasn't very
>> useful to figure out what should be built with it.
>>
> No.  What I am asking is very simple: if you have invented a hammer
> (description), tell me how the hammer (the description) differs from the
> axe (the awww:representation) so I will know when to use hammer and when
> axe.  From what I see, your viewpoint would be this: if you use a thing
> to drive a nail (i.e., HTTP Link), call the tool the hammer and if you
> use it to half something (HTTP GET), call it an axe.
>
> Would the above analogy fine?


That is a fair request, and one that at least I have tried to answer insofar
as URIQA is concerned, though I think that my earlier answer to this
question can be fairly generalized to apply to *any* proposed solution for
knowledge discovery on the web.

In order to answer the question, one must have a clear definition of what is
meant by "description" and what the primary purpose of such descriptions is.
It may also be useful of we used more qualified terms such as "semantic web
description" and "web representation" to indicate that we are talking about
very specific, narrowly constrained meanings.

Taking "resource" and "representation" per AWWW...

I propose that the "semantic web description" of a resource be defined as a
particular subtype of "representation" from which may be derived one or more
RDF graphs which, if merged, the merged graph will contain one or more
triples in which the URI of the resource in question occurs as the subject,
and where there may be zero or more triples in which the URI of the resource
in question does not occur as the subject.

Note that a particular representation may offer a description of the
resource, such as one expressed in English prose, but if no RDF graph can be
derived from the representation such that the statements of fact made about
the resource are inaccessible to a semantic web agent, it is not a semantic
web description (even if it may be a description of the resource in the
broader sense). This is a practical, functional distinction.

And just as there may be multiple representations of a given resource, so
too may there be multiple semantic web descriptions of a given resource.

Those alternative semantic web descriptions may differ in their

(a) level of detail (how much is said about the resource in question)

(b) degree of focus (how much is said about other related resources, either
in a particular graph or in all graphs serialized in the )

(c) encoding (how the RDF graph(s) are serialized in the representation)

(d) noise level (the ratio of bytes which correspond to graph serialization
versus other markup and/or content of some kind)

The above four facets come into play across the entire spectrum of metadata
creation, management, publication, and discovery.

Folks building semantic web agents, and servers which cater to them, are
seeking an optimal, standardized way for semantic web agents to have clear
and efficient access to those particular special semantic web description
representations they need, such that both the means of access are optimal as
well as the above facets (a) through (d) are optimal for their needs, and to
do so with minimal disruption to existing web based solutions, minimal
burden to either implementors or content producers.

To take the hammer analogy. A representation is a nail. A semantic web
description is a specialized kind of nail which ideally needs a particular
kind of hammer. Not any old hammer will work well for that nail, some of the
hammers may work better than others, and many kinds of hammers won't work at
all. If one needs to use a particular kind of nail, one looks for the most
optimal kind of hammer available for that kind of nail, and if none of the
hammers one has in one's toolbox are sufficiently good for the job (even if
a few might be made to work with a certain level of success) one adds a new
hammer to one's toolbox, a hammer that works optimally with that particular
kind of nail. And if one works almost exclusively with a particular kind of
nail, one will want a hammer that is as optimal as possible for that kind of
nail.

HTTP GET is a very good and long proven hammer for working with
representation nails. You might consider it your quitenssential hammer, and
representations the quitessential nail. The kind that everyone has around
the house or shop, and is used far more than any other kind of hammer and
nail.

Semantic web descriptions are a very special kind of nail.

HTTP GET plus some form of linking is a hammer that *can* be used for
semantic web description nails, but not optimally (and I've explained why
elsewhere).

HTTP GET plus content negotiation is another hammer that *can* be used for
semantic web description nails, but not optimally (and I've explained why
elsewhere).

URIQA is a hammer that is specifically designed to be maximally optimal for
working with semantic web description nails. Semantic web agents who deal
almost exclusively with semantic web description nails deserve a hammer that
is optimally suited for their work, not just any old hammer that kind of
gets the job done, but not terribly well.

>
> As you haven't tell me the difference between the hammer and an axe, (if
> so, please do it again because what I now get is only the symbolic
> difference but semantic ones), then I would call them both "tool" (my
> HTTP GET).  Hence, if I want to either drive a nail or half something, I
> will simply use the "tool".  Hence, who is suggesting that Axe or tool
> has a critical flaw.  It is definitely not me as my vocabulary doesn't
> have the word "axe".  Had there been one, it must be a synonym to "tool".
>
> If you do, please tell me the semantic difference first.  If it is so
> clear to you, I bet you can construct something concrete.

Hopefully, the above will have answered your question to some useful extent.

>> The link framework offers something very simple. If you have two resources,
>> where you have an interest in one, and would like to obtain more information
>> (given a very specific context), you can find this extra information
>> elsewhere. It has nothing to do with conneg. We are talking about two
>> discrete resources. But the key here is that links by themselves don't do
>> much. Applications must specify how certain links are used in certain
>> situations. You are completely ignoring the application layer.
>>
> Sure.  RDF is simple too.  a:Resource a:Property(or a:Predicate or a
> link:type) another:Resource.  Again, call me numb but I don't know how
> Link is any different from RDF.  I did not imply anything else, except
> that I cannot see how different the semantics put in Link would be any
> different from an RDF file.

Apples and oranges.

Linking is one proposed methdology for getting to the RDF graph describing a
particular resource.

>
> You put the discussion of Conneg way too early now.  What I have asked
> is two but related questions.
>
> The first question is what the above is centered, i.e., the *semantic*
> difference between Description and Representation.

I think this is where you keep getting hung up. The distinction between a
representation and a semantic web description (which is a kind of
representation) is functional, not semantic.

If a semantic web agent can reliably and accurately derive one or more RDF
graphs from a representation, then that representation serves as a semantic
web description. It's as simple as that.

>   At this time, Conneg
> is not involved. As I cannot tell them apart, I am guessing that, if
> that is the case, the necessity for Link/MGET, must be because there
> exists some reason that a resource cannot serve its
> Description/Representation.

No, it's because the default representation served for a given MIME type is
usually optimized for consumption by a browser agent, not a semantic web
agent, and while there are approaches to constructing representations which
would serve both types of agents (e.g. RDFa, etc.) it is not always
beneficial or feasable to produce such dual-purpose representations, and
semantic web agents will still need a way to communicate to the server their
need for a semantic web description rather than some other representation.


> That is what I said: it must come down to
> one of the arguments of either IR and legacy *representation*.  It is
> under the argument of legacy *resource*, that content negotiation comes
> into play.  I dispute the notion of legacy *resource* because a resource
> can always have new *representation*. It is under this context that I
> said that Link is functionally redundant to Conneg.

Well, one could provide a tool/solution allowing content publishers to
define links associating representations with descriptions in a manner
entirely separate from creation and management of the representations
themselves, having the link communicated to agents via the HTTP header, so
the linking approach can be made to work with legacy content. It's just that
having so many different ways to link forces agents to have to hunt in
multiple places for that information, even having to retrieve and parse the
representation itself, rather than one single, simple, consistent request to
the server a'la URIQA.

And offering URIQA support for semantic web agents in no way precludes using
any of those linking approaches on the authoring/management side to
associate descriptions with resources, but such internal processes are not
relevant to external agents and URIQA enables whatever metadata management
and publication techniques a site might employ (possibly many, and probably
changing over time) to remain properly hidden under the hood.

Linking tied to representations (either in the HTTP header or embedded)
makes sense when those links are presumed to be interpreted in the context
of consuming the particular representation (e.g. links to stylesheets,
etc.). But such linking places too much needless processing burden on
semantic web agents who really are not interested in just any
representation, but only semantic web descriptions.

All of these alternative proposals to URIQA for efficient *uniform* access
to metadata are actually optimal more as alternative methods of exposing
metadata which can be harvested and syndicated into a solution which
actually provides the truly uniform access to that metdata for semantic web
agents.

Expecting every semantic web agent to support all of the various methods,
and to have to deal with representations containing embedded metadata or
nothing more than embedded links to metadata, and to have to sleuth around
to figure out which of a number of possible discovery strategies is being
used on a particular site is *ludicrous*.

Linking is great and useful. Microformats are great and useful. <meta> tags
area great and useful. There are many many great and useful methods for
exposing descriptive metadata about resources, and different environments
and processes (and user skills) will perfer some options to others.

But when it comes time for semantic web agents to ask particular servers for
authoritative descriptions of resources denoted by a URI grounded in that
server root, we really should expect a single, simple, efficient, optimal,
and *uniform* method of access.

To that end, I see no other valid proposal on the table aside from URIQA.

>
> Of course, I could be wrong.  But don't you think that the following two
> items would be more productive and straight-forward?
> (1) A definition of that tells Description from Representation.

See above.

> (2) A use case that illustrated the necessity of Link w/o either resting
> on the concept of IR, (if you insist, again, give a concrete definition
> of IR that tells it from non-IRs) or due to the limit of a specific
> format.

Taking this as meaning, a use case showing why it must be possible for a
semantic web agent to both ask specifically for a semantic web description
and specifically for a particular MIME type, I've at least provided that
already.

I'll let Eran respond with a linking specific use case.

>
> Would this be fair?
>> Now, if you want to use an axe to insert related information into the
>> resource itself, go ahead. But I strongly believe you are using the wrong
>> tool here (to put it mildly).
> Sigh.  Who is the guilty party?  (See your opening paragraph)
>> The endless discussion over links vs. conneg is pointless. I learned not to
>> debate religion when I was 12, and that lesson applies here.
>>
>> I am not going to use conneg for my use cases because:
>>
>> 1. It overloads content-type with relation type or worse, an application
>> specific activity.
>>
> It is that LINK overloads RDF and HTTP.  All Headers of the HTTP
> requests are, in fact, about parsing of the HTTP entity.  Link, in fact,
> breaks this boundary.

Agreed, the link header overloads the semantics of the HTTP response (and
this is something I've pointed out ages ago, several years ago).

But also, using content negotiation to specifically request a semantic web
description rather than some other representation also overloads the
semantics of content negotiation.

URIQA does not overload nor conflict with any of the existing semantics of
any existing protocol.

>> 2. It requires minting content types that are limited to representing
>> metadata. A quick look at a typical Windows registry for file types or URI
>> scheme types shows just how broken this approach is.
>>
> I am not exactly sure what you are imputing here.

I've made the very same point to you as Eran has above, and you just don't
seem to be understanding it. You may want to mull it over a bit.

> If you are suggesting
> that only one (or a few formats) for every task.  I disagree.  There are
> just too many real-world needs for different format under different
> situations.  For instance, I don't think XML-based format is the ideal
> choice for encoding large-size scientific data.  And many programmers
> have voted with their feet, such as with the development of YAML, JSON
> etc.  If you are talking about the flaws of Windows, I bet they would
> eventually accommodate to popular demands because their goal is to sell
> machines.
>> 3. There is no way to find meta-metadata. Given three resources, C describes
>> B and B describes A, how would conneg accomplish that? Mint a content type
>> for a description of a description?
>>
> Find any RDF file and tell me which resource is data and which is
> metadata.  I want to remind you, for any rdf:Property, there exists an
> implicit inverse property.  If you can divide an RDF graph in two parts
> -- one data and the other metadata, you would really have answered my
> first question raised above.  I cannot.  It is really beyond my
> intelligence.
>> 4. It partially fails the Equal Access Principle in that it is not a simple
>> feature for many small and large providers to support. I can tell you that
>> Yahoo! will not support connect for metadata on any of its high value
>> properties for a wide range of reasons. Also many web clients don't give full
>> access to the Accept header or other conneg features. The community I serve
>> with this work depends heavily on extreme pragmatism.
>>
> I don't know what Equal Access Principle go to do with it.  Should I
> expect my cell phone browser gives me the same thing or feel of the one
> on my laptop?  And should I expect my RDF agent to do the same thing as
> my ordinary Web browser?  Besides, it only says that Conneg has not been
> understood and perhaps underused.  It says nothing about the necessity
> or superiority of Link.

I have to agree with you there Xiaoshu. I've read through the Equal Access
Principle a couple times and can't shake off the conclusion that it's
essentially "no agent left behind" for the web, and will simply result in
the dumbing down of the machinery.

I'm all for open, standards based access, keeping things simple, lean and
mean, and ensuring that common solutions work as well as possible on as many
platforms as possible, and I think that is compatible with the general
sentiment of Equal Access Principle, but it is taken too far.

>
> Of course, your implication might be: let's totally remove Conneg.  If
> this is true.  This is a totally different issue.  Under this condition,
> i.e., without Conneg, Link/MGET is necessary.
>> 5. It doesn't allow for an easy one resource-many descriptors link type (you
>> can return a 300 but that isn't really widely used or understood).
>>
> You mean that in RDF, I cannot say that?
>> And all of this completely ignores the basic principle that data and metadata
>> are not always just different representations of the same resource.
>>
> Basic Principle? On which semantic or architectural foundation?
>> So I'll use links and you use conneg and meet again in 5 years and see who is
>> getting more traction. Any further debate on this is a waste of time.
>>
> Sure.
>
> Xiaoshu
>> EHL

Regards,

Patrick


>>
>>
>>
>>
>>
>>> -----Original Message-----
>>> From: Xiaoshu Wang [mailto:wangxiao@musc.edu]
>>> Sent: Tuesday, February 24, 2009 4:00 PM
>>> To: Eran Hammer-Lahav
>>> Cc: Julian Reschke; Patrick.Stickler@nokia.com;
>>> jar@creativecommons.org; connolly@w3.org; www-tag@w3.org
>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>
>>> The critical flaw of all the proposed approach is that the definition
>>> of
>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>> the "describedBy" relations for example.  Here I quote from Eran's
>>> link.
>>>
>>>       The relationship A "describedby" B asserts that resource B
>>>       provides a description of resource A. There are no constraints on
>>>       the format or representation of either A or B, neither are there
>>>       any further constraints on either resource.
>>>
>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>> or B.  As a URI client, how should I know when should I get A and when
>>> B?  Since I don't know what I might be missing from either A or B, it
>>> seems to suggest that I must always get both A and B. Thus, I cannot
>>> help but wondering why they are not put together at A at the first
>>> place.
>>>
>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>> PROFOUND is different because when people use it, they have already
>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>> ideas only works when the client already have some knowledge about A.
>>> But, to propose it as a general framework for the Web, it won't work.
>>> At the most fundamental level, we only know three things about the Web
>>> -- URI, Representation, Resource.  The concept of metadata is
>>> ill-conceived at this level because as data about data, to say metadata
>>> implies that we already know something about the resource we tries to
>>> access, a piece of knowledge that we don't have.
>>>
>>> There are a lot of implicit assumptions under the so-called "uniform
>>> access to metadata/descriptor" approach.  It either requires the
>>> definition of IR or a one-on-one relationship between Resource and
>>> Representation.  As the former implies that non-IR cannot have a
>>> representation, it makes the "descriptor/metadata" necessary.  The
>>> knock
>>> on this assumption is that the definition of IR is impossible to work
>>> with.
>>>
>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>> might be something as "legacy representation" but there should NOT be
>>> such thing as "legacy resource" because the latter implies that the
>>> Resource is closed and no more semantics will be added.
>>>
>>> But the so-called "metadata/descriptor" problems can be solved by using
>>> HTTP Content Negotiation, making any other proposal a redundant one.
>>> The
>>> actual issue, as I have discussed in [1], is about the incomplete
>>> syntax
>>> of the URI specs, which  currently does not have a syntactic notation
>>> the other two foundation objects in the Web, i.e., URI and
>>> Representation.  Once we supplement URI spec with those syntactic
>>> sugar,
>>> such as the one I proposed in [2], then, we can have a uniform approach
>>> to (1) describe URI along with standard resources and (2) to
>>> systematically discover the possible representation types, i.e.,
>>> Content-Type/MIME types, associated with a Resource (either URI or
>>> standard Resource). As a particular content-type is equivalent of a
>>> particular *service*, hence, the approach in effect establishes a
>>> uniformed approach to service discovery.
>>>
>>> What is required is to define Content-Type in URI.  Once we have these,
>>> not only Data/Resource are linked but DataType/Service.  The best of
>>> all, it works within the conceptualizations defined in AWWW, and does
>>> not require any other ambiguous conceptualization, such as, IR,
>>> metadata, and description, etc.
>>>
>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>
>>> Xiaoshu
>>>
>>> Eran Hammer-Lahav wrote:
>>>
>>>> Both of which are included in my analysis [1] for the discovery
>>>>
>>> proposal.
>>>
>>>> EHL
>>>>
>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>> To: Patrick.Stickler@nokia.com
>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org;
>>>>>
>>> www-
>>>
>>>>> tag@w3.org
>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>
>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>
>>>>>
>>>>>> ...
>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>
>>>>>>
>>>>> MGET/MPUT/etc.
>>>>>
>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>
>>>>> BR, Julian
>>>>>
>>>>>
>>>>
>>
>>
Received on Monday, 2 March 2009 10:14:18 UTC