Re: Uniform access to metadata: XRD use case. from Xiaoshu Wang on 2009-03-02 (www-tag@w3.org from March 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Mon, 02 Mar 2009 11:42:01 +0000
To: "Patrick.Stickler@nokia.com" <Patrick.Stickler@nokia.com>
CC: "eran@hueniverse.com" <eran@hueniverse.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "jar@creativecommons.org" <jar@creativecommons.org>, "connolly@w3.org" <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <49ABC609.2070209@musc.edu>
Patrick.Stickler@nokia.com wrote:
> I don't agree that content negotiation is a proper solution to this problem,
> but Xiaoshu asks some valid questions below, and also makes some valid
> points (most of which I myself have made in the past)...
>   
Thanks.  At least, I know some of my points has gotten crossed. I might 
be stubborn but I am not a close-minded person. I am, in fact, just the 
opposite.  What I am doing is trying to reach a shared understanding by 
seeking a clarification of terminologies. Because I believe, only by 
this way, we can make the Web pragmatic. 

Eran's previous email might imply that I am initiating a religion 
debate.  I am not sure if I am guilty or not because the difference 
between  religion and philosophy is murky as well.  Bertrand Russel's 
has described philosophy as something between Religion and Science and 
he distinguished the latter from the former by its appeal to human 
reason as opposed to authority.

This -- the human reason -- is the foundation that I am trying to lay 
for the debate either for the distinction between Description and 
Representation or the distinction between IR and non-IR. The principle 
that, I think, we should follow is thus of WVO Quine's "ontological 
commitment".  I think Tim has also mentioned a few times at various 
places that the Web architecture is also somewhat about engineering 
philosophy (I forget the exact wording but it is something in that 
essence). The ultimate purpose is to make the Web (our choice of word 
and approach) pragmatic.

I would hope that we can do this step-by-step.  Let's reach a consensus 
on a concrete definition of Description (and its distinction from 
Representation) first.  Whether Conneg/Link/MGET is the right or wrong 
approach is secondary issue as it depends on our conceptualization of 
the former.  To avoid the thread out of control, I will refrain from 
myself from debating Conneg for now.  By the way, I have already 
outlined my logic in my previous email, hence, all of you should have 
already known my argument.

Xiaoshu
> On 2009-03-02 01:52, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>
>   
>>
>> Eran Hammer-Lahav wrote:
>>     
>>> The reason why your position on links is pointless is because you are trying
>>> to use a framework - a tool - as the end and not the mean. Your entire
>>> argument is equal to someone walking over to the guy who invented the first
>>> axe and told him it has a critical flaw because by itself, it wasn't very
>>> useful to figure out what should be built with it.
>>>
>>>       
>> No.  What I am asking is very simple: if you have invented a hammer
>> (description), tell me how the hammer (the description) differs from the
>> axe (the awww:representation) so I will know when to use hammer and when
>> axe.  From what I see, your viewpoint would be this: if you use a thing
>> to drive a nail (i.e., HTTP Link), call the tool the hammer and if you
>> use it to half something (HTTP GET), call it an axe.
>>
>> Would the above analogy fine?
>>     
>
>
> That is a fair request, and one that at least I have tried to answer insofar
> as URIQA is concerned, though I think that my earlier answer to this
> question can be fairly generalized to apply to *any* proposed solution for
> knowledge discovery on the web.
>
> In order to answer the question, one must have a clear definition of what is
> meant by "description" and what the primary purpose of such descriptions is.
> It may also be useful of we used more qualified terms such as "semantic web
> description" and "web representation" to indicate that we are talking about
> very specific, narrowly constrained meanings.
>
> Taking "resource" and "representation" per AWWW...
>
> I propose that the "semantic web description" of a resource be defined as a
> particular subtype of "representation" from which may be derived one or more
> RDF graphs which, if merged, the merged graph will contain one or more
> triples in which the URI of the resource in question occurs as the subject,
> and where there may be zero or more triples in which the URI of the resource
> in question does not occur as the subject.
>
> Note that a particular representation may offer a description of the
> resource, such as one expressed in English prose, but if no RDF graph can be
> derived from the representation such that the statements of fact made about
> the resource are inaccessible to a semantic web agent, it is not a semantic
> web description (even if it may be a description of the resource in the
> broader sense). This is a practical, functional distinction.
>
> And just as there may be multiple representations of a given resource, so
> too may there be multiple semantic web descriptions of a given resource.
>
> Those alternative semantic web descriptions may differ in their
>
> (a) level of detail (how much is said about the resource in question)
>
> (b) degree of focus (how much is said about other related resources, either
> in a particular graph or in all graphs serialized in the )
>
> (c) encoding (how the RDF graph(s) are serialized in the representation)
>
> (d) noise level (the ratio of bytes which correspond to graph serialization
> versus other markup and/or content of some kind)
>
> The above four facets come into play across the entire spectrum of metadata
> creation, management, publication, and discovery.
>
> Folks building semantic web agents, and servers which cater to them, are
> seeking an optimal, standardized way for semantic web agents to have clear
> and efficient access to those particular special semantic web description
> representations they need, such that both the means of access are optimal as
> well as the above facets (a) through (d) are optimal for their needs, and to
> do so with minimal disruption to existing web based solutions, minimal
> burden to either implementors or content producers.
>
> To take the hammer analogy. A representation is a nail. A semantic web
> description is a specialized kind of nail which ideally needs a particular
> kind of hammer. Not any old hammer will work well for that nail, some of the
> hammers may work better than others, and many kinds of hammers won't work at
> all. If one needs to use a particular kind of nail, one looks for the most
> optimal kind of hammer available for that kind of nail, and if none of the
> hammers one has in one's toolbox are sufficiently good for the job (even if
> a few might be made to work with a certain level of success) one adds a new
> hammer to one's toolbox, a hammer that works optimally with that particular
> kind of nail. And if one works almost exclusively with a particular kind of
> nail, one will want a hammer that is as optimal as possible for that kind of
> nail.
>
> HTTP GET is a very good and long proven hammer for working with
> representation nails. You might consider it your quitenssential hammer, and
> representations the quitessential nail. The kind that everyone has around
> the house or shop, and is used far more than any other kind of hammer and
> nail.
>
> Semantic web descriptions are a very special kind of nail.
>
> HTTP GET plus some form of linking is a hammer that *can* be used for
> semantic web description nails, but not optimally (and I've explained why
> elsewhere).
>
> HTTP GET plus content negotiation is another hammer that *can* be used for
> semantic web description nails, but not optimally (and I've explained why
> elsewhere).
>
> URIQA is a hammer that is specifically designed to be maximally optimal for
> working with semantic web description nails. Semantic web agents who deal
> almost exclusively with semantic web description nails deserve a hammer that
> is optimally suited for their work, not just any old hammer that kind of
> gets the job done, but not terribly well.
>
>   
>> As you haven't tell me the difference between the hammer and an axe, (if
>> so, please do it again because what I now get is only the symbolic
>> difference but semantic ones), then I would call them both "tool" (my
>> HTTP GET).  Hence, if I want to either drive a nail or half something, I
>> will simply use the "tool".  Hence, who is suggesting that Axe or tool
>> has a critical flaw.  It is definitely not me as my vocabulary doesn't
>> have the word "axe".  Had there been one, it must be a synonym to "tool".
>>
>> If you do, please tell me the semantic difference first.  If it is so
>> clear to you, I bet you can construct something concrete.
>>     
>
> Hopefully, the above will have answered your question to some useful extent.
>
>   
>>> The link framework offers something very simple. If you have two resources,
>>> where you have an interest in one, and would like to obtain more information
>>> (given a very specific context), you can find this extra information
>>> elsewhere. It has nothing to do with conneg. We are talking about two
>>> discrete resources. But the key here is that links by themselves don't do
>>> much. Applications must specify how certain links are used in certain
>>> situations. You are completely ignoring the application layer.
>>>
>>>       
>> Sure.  RDF is simple too.  a:Resource a:Property(or a:Predicate or a
>> link:type) another:Resource.  Again, call me numb but I don't know how
>> Link is any different from RDF.  I did not imply anything else, except
>> that I cannot see how different the semantics put in Link would be any
>> different from an RDF file.
>>     
>
> Apples and oranges.
>
> Linking is one proposed methdology for getting to the RDF graph describing a
> particular resource.
>
>   
>> You put the discussion of Conneg way too early now.  What I have asked
>> is two but related questions.
>>
>> The first question is what the above is centered, i.e., the *semantic*
>> difference between Description and Representation.
>>     
>
> I think this is where you keep getting hung up. The distinction between a
> representation and a semantic web description (which is a kind of
> representation) is functional, not semantic.
>
> If a semantic web agent can reliably and accurately derive one or more RDF
> graphs from a representation, then that representation serves as a semantic
> web description. It's as simple as that.
>
>   
>>   At this time, Conneg
>> is not involved. As I cannot tell them apart, I am guessing that, if
>> that is the case, the necessity for Link/MGET, must be because there
>> exists some reason that a resource cannot serve its
>> Description/Representation.
>>     
>
> No, it's because the default representation served for a given MIME type is
> usually optimized for consumption by a browser agent, not a semantic web
> agent, and while there are approaches to constructing representations which
> would serve both types of agents (e.g. RDFa, etc.) it is not always
> beneficial or feasable to produce such dual-purpose representations, and
> semantic web agents will still need a way to communicate to the server their
> need for a semantic web description rather than some other representation.
>
>
>   
>> That is what I said: it must come down to
>> one of the arguments of either IR and legacy *representation*.  It is
>> under the argument of legacy *resource*, that content negotiation comes
>> into play.  I dispute the notion of legacy *resource* because a resource
>> can always have new *representation*. It is under this context that I
>> said that Link is functionally redundant to Conneg.
>>     
>
> Well, one could provide a tool/solution allowing content publishers to
> define links associating representations with descriptions in a manner
> entirely separate from creation and management of the representations
> themselves, having the link communicated to agents via the HTTP header, so
> the linking approach can be made to work with legacy content. It's just that
> having so many different ways to link forces agents to have to hunt in
> multiple places for that information, even having to retrieve and parse the
> representation itself, rather than one single, simple, consistent request to
> the server a'la URIQA.
>
> And offering URIQA support for semantic web agents in no way precludes using
> any of those linking approaches on the authoring/management side to
> associate descriptions with resources, but such internal processes are not
> relevant to external agents and URIQA enables whatever metadata management
> and publication techniques a site might employ (possibly many, and probably
> changing over time) to remain properly hidden under the hood.
>
> Linking tied to representations (either in the HTTP header or embedded)
> makes sense when those links are presumed to be interpreted in the context
> of consuming the particular representation (e.g. links to stylesheets,
> etc.). But such linking places too much needless processing burden on
> semantic web agents who really are not interested in just any
> representation, but only semantic web descriptions.
>
> All of these alternative proposals to URIQA for efficient *uniform* access
> to metadata are actually optimal more as alternative methods of exposing
> metadata which can be harvested and syndicated into a solution which
> actually provides the truly uniform access to that metdata for semantic web
> agents.
>
> Expecting every semantic web agent to support all of the various methods,
> and to have to deal with representations containing embedded metadata or
> nothing more than embedded links to metadata, and to have to sleuth around
> to figure out which of a number of possible discovery strategies is being
> used on a particular site is *ludicrous*.
>
> Linking is great and useful. Microformats are great and useful. <meta> tags
> area great and useful. There are many many great and useful methods for
> exposing descriptive metadata about resources, and different environments
> and processes (and user skills) will perfer some options to others.
>
> But when it comes time for semantic web agents to ask particular servers for
> authoritative descriptions of resources denoted by a URI grounded in that
> server root, we really should expect a single, simple, efficient, optimal,
> and *uniform* method of access.
>
> To that end, I see no other valid proposal on the table aside from URIQA.
>
>   
>> Of course, I could be wrong.  But don't you think that the following two
>> items would be more productive and straight-forward?
>> (1) A definition of that tells Description from Representation.
>>     
>
> See above.
>
>   
>> (2) A use case that illustrated the necessity of Link w/o either resting
>> on the concept of IR, (if you insist, again, give a concrete definition
>> of IR that tells it from non-IRs) or due to the limit of a specific
>> format.
>>     
>
> Taking this as meaning, a use case showing why it must be possible for a
> semantic web agent to both ask specifically for a semantic web description
> and specifically for a particular MIME type, I've at least provided that
> already.
>
> I'll let Eran respond with a linking specific use case.
>
>   
>> Would this be fair?
>>     
>>> Now, if you want to use an axe to insert related information into the
>>> resource itself, go ahead. But I strongly believe you are using the wrong
>>> tool here (to put it mildly).
>>>       
>> Sigh.  Who is the guilty party?  (See your opening paragraph)
>>     
>>> The endless discussion over links vs. conneg is pointless. I learned not to
>>> debate religion when I was 12, and that lesson applies here.
>>>
>>> I am not going to use conneg for my use cases because:
>>>
>>> 1. It overloads content-type with relation type or worse, an application
>>> specific activity.
>>>
>>>       
>> It is that LINK overloads RDF and HTTP.  All Headers of the HTTP
>> requests are, in fact, about parsing of the HTTP entity.  Link, in fact,
>> breaks this boundary.
>>     
>
> Agreed, the link header overloads the semantics of the HTTP response (and
> this is something I've pointed out ages ago, several years ago).
>
> But also, using content negotiation to specifically request a semantic web
> description rather than some other representation also overloads the
> semantics of content negotiation.
>
> URIQA does not overload nor conflict with any of the existing semantics of
> any existing protocol.
>
>   
>>> 2. It requires minting content types that are limited to representing
>>> metadata. A quick look at a typical Windows registry for file types or URI
>>> scheme types shows just how broken this approach is.
>>>
>>>       
>> I am not exactly sure what you are imputing here.
>>     
>
> I've made the very same point to you as Eran has above, and you just don't
> seem to be understanding it. You may want to mull it over a bit.
>
>   
>> If you are suggesting
>> that only one (or a few formats) for every task.  I disagree.  There are
>> just too many real-world needs for different format under different
>> situations.  For instance, I don't think XML-based format is the ideal
>> choice for encoding large-size scientific data.  And many programmers
>> have voted with their feet, such as with the development of YAML, JSON
>> etc.  If you are talking about the flaws of Windows, I bet they would
>> eventually accommodate to popular demands because their goal is to sell
>> machines.
>>     
>>> 3. There is no way to find meta-metadata. Given three resources, C describes
>>> B and B describes A, how would conneg accomplish that? Mint a content type
>>> for a description of a description?
>>>
>>>       
>> Find any RDF file and tell me which resource is data and which is
>> metadata.  I want to remind you, for any rdf:Property, there exists an
>> implicit inverse property.  If you can divide an RDF graph in two parts
>> -- one data and the other metadata, you would really have answered my
>> first question raised above.  I cannot.  It is really beyond my
>> intelligence.
>>     
>>> 4. It partially fails the Equal Access Principle in that it is not a simple
>>> feature for many small and large providers to support. I can tell you that
>>> Yahoo! will not support connect for metadata on any of its high value
>>> properties for a wide range of reasons. Also many web clients don't give full
>>> access to the Accept header or other conneg features. The community I serve
>>> with this work depends heavily on extreme pragmatism.
>>>
>>>       
>> I don't know what Equal Access Principle go to do with it.  Should I
>> expect my cell phone browser gives me the same thing or feel of the one
>> on my laptop?  And should I expect my RDF agent to do the same thing as
>> my ordinary Web browser?  Besides, it only says that Conneg has not been
>> understood and perhaps underused.  It says nothing about the necessity
>> or superiority of Link.
>>     
>
> I have to agree with you there Xiaoshu. I've read through the Equal Access
> Principle a couple times and can't shake off the conclusion that it's
> essentially "no agent left behind" for the web, and will simply result in
> the dumbing down of the machinery.
>
> I'm all for open, standards based access, keeping things simple, lean and
> mean, and ensuring that common solutions work as well as possible on as many
> platforms as possible, and I think that is compatible with the general
> sentiment of Equal Access Principle, but it is taken too far.
>
>   
>> Of course, your implication might be: let's totally remove Conneg.  If
>> this is true.  This is a totally different issue.  Under this condition,
>> i.e., without Conneg, Link/MGET is necessary.
>>     
>>> 5. It doesn't allow for an easy one resource-many descriptors link type (you
>>> can return a 300 but that isn't really widely used or understood).
>>>
>>>       
>> You mean that in RDF, I cannot say that?
>>     
>>> And all of this completely ignores the basic principle that data and metadata
>>> are not always just different representations of the same resource.
>>>
>>>       
>> Basic Principle? On which semantic or architectural foundation?
>>     
>>> So I'll use links and you use conneg and meet again in 5 years and see who is
>>> getting more traction. Any further debate on this is a waste of time.
>>>
>>>       
>> Sure.
>>
>> Xiaoshu
>>     
>>> EHL
>>>       
>
> Regards,
>
> Patrick
>
>
>
>   
>>>
>>>
>>>
>>>       
>>>> -----Original Message-----
>>>> From: Xiaoshu Wang [mailto:wangxiao@musc.edu]
>>>> Sent: Tuesday, February 24, 2009 4:00 PM
>>>> To: Eran Hammer-Lahav
>>>> Cc: Julian Reschke; Patrick.Stickler@nokia.com;
>>>> jar@creativecommons.org; connolly@w3.org; www-tag@w3.org
>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>
>>>> The critical flaw of all the proposed approach is that the definition
>>>> of
>>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>>> the "describedBy" relations for example.  Here I quote from Eran's
>>>> link.
>>>>
>>>>       The relationship A "describedby" B asserts that resource B
>>>>       provides a description of resource A. There are no constraints on
>>>>       the format or representation of either A or B, neither are there
>>>>       any further constraints on either resource.
>>>>
>>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>>> or B.  As a URI client, how should I know when should I get A and when
>>>> B?  Since I don't know what I might be missing from either A or B, it
>>>> seems to suggest that I must always get both A and B. Thus, I cannot
>>>> help but wondering why they are not put together at A at the first
>>>> place.
>>>>
>>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>>> PROFOUND is different because when people use it, they have already
>>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>>> ideas only works when the client already have some knowledge about A.
>>>> But, to propose it as a general framework for the Web, it won't work.
>>>> At the most fundamental level, we only know three things about the Web
>>>> -- URI, Representation, Resource.  The concept of metadata is
>>>> ill-conceived at this level because as data about data, to say metadata
>>>> implies that we already know something about the resource we tries to
>>>> access, a piece of knowledge that we don't have.
>>>>
>>>> There are a lot of implicit assumptions under the so-called "uniform
>>>> access to metadata/descriptor" approach.  It either requires the
>>>> definition of IR or a one-on-one relationship between Resource and
>>>> Representation.  As the former implies that non-IR cannot have a
>>>> representation, it makes the "descriptor/metadata" necessary.  The
>>>> knock
>>>> on this assumption is that the definition of IR is impossible to work
>>>> with.
>>>>
>>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>>> might be something as "legacy representation" but there should NOT be
>>>> such thing as "legacy resource" because the latter implies that the
>>>> Resource is closed and no more semantics will be added.
>>>>
>>>> But the so-called "metadata/descriptor" problems can be solved by using
>>>> HTTP Content Negotiation, making any other proposal a redundant one.
>>>> The
>>>> actual issue, as I have discussed in [1], is about the incomplete
>>>> syntax
>>>> of the URI specs, which  currently does not have a syntactic notation
>>>> the other two foundation objects in the Web, i.e., URI and
>>>> Representation.  Once we supplement URI spec with those syntactic
>>>> sugar,
>>>> such as the one I proposed in [2], then, we can have a uniform approach
>>>> to (1) describe URI along with standard resources and (2) to
>>>> systematically discover the possible representation types, i.e.,
>>>> Content-Type/MIME types, associated with a Resource (either URI or
>>>> standard Resource). As a particular content-type is equivalent of a
>>>> particular *service*, hence, the approach in effect establishes a
>>>> uniformed approach to service discovery.
>>>>
>>>> What is required is to define Content-Type in URI.  Once we have these,
>>>> not only Data/Resource are linked but DataType/Service.  The best of
>>>> all, it works within the conceptualizations defined in AWWW, and does
>>>> not require any other ambiguous conceptualization, such as, IR,
>>>> metadata, and description, etc.
>>>>
>>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>>
>>>> Xiaoshu
>>>>
>>>> Eran Hammer-Lahav wrote:
>>>>
>>>>         
>>>>> Both of which are included in my analysis [1] for the discovery
>>>>>
>>>>>           
>>>> proposal.
>>>>
>>>>         
>>>>> EHL
>>>>>
>>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> -----Original Message-----
>>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>>> To: Patrick.Stickler@nokia.com
>>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org;
>>>>>>
>>>>>>             
>>>> www-
>>>>
>>>>         
>>>>>> tag@w3.org
>>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>>
>>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ...
>>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> MGET/MPUT/etc.
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> ...
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>>
>>>>>> BR, Julian
>>>>>>
>>>>>>
>>>>>>             
>>>       
>
>
Received on Monday, 2 March 2009 11:42:52 UTC