Re: Uniform access to metadata: XRD use case. from Xiaoshu Wang on 2009-02-25 (www-tag@w3.org from February 2009)

From: Xiaoshu Wang <wangxiao@musc.edu>
Date: Wed, 25 Feb 2009 17:05:03 +0000
To: "Patrick.Stickler@nokia.com" <Patrick.Stickler@nokia.com>
CC: "eran@hueniverse.com" <eran@hueniverse.com>, "julian.reschke@gmx.de" <julian.reschke@gmx.de>, "jar@creativecommons.org" <jar@creativecommons.org>, "connolly@w3.org" <connolly@w3.org>, "www-tag@w3.org" <www-tag@w3.org>
Message-ID: <49A57A3F.7050003@musc.edu>
Patrick.Stickler@nokia.com wrote:
>
> On 2009-02-25 17:26, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>
>   
>>
>> Patrick.Stickler@nokia.com wrote:
>>     
>>> On 2009-02-25 11:40, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>>>
>>>
>>>       
>>>> Patrick.Stickler@nokia.com wrote:
>>>>
>>>>         
>>>>> On 2009-02-25 02:00, "ext Xiaoshu Wang" <wangxiao@musc.edu> wrote:
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> The critical flaw of all the proposed approach is that the definition of
>>>>>> "metadata/descriptor" is ambiguous and hence useless in practice.  Take
>>>>>> the "describedBy" relations for example.  Here I quote from Eran's link.
>>>>>>
>>>>>>       The relationship A "describedby" B asserts that resource B
>>>>>>       provides a description of resource A. There are no constraints on
>>>>>>       the format or representation of either A or B, neither are there
>>>>>>       any further constraints on either resource.
>>>>>>
>>>>>> As a URI owner, I don't know what kind of stuff that I should put in A
>>>>>> or B.  As a URI client, how should I know when should I get A and when
>>>>>> B?  Since I don't know what I might be missing from either A or B, it
>>>>>> seems to suggest that I must always get both A and B. Thus, I cannot
>>>>>> help but wondering why they are not put together at A at the first place.
>>>>>>
>>>>>> The same goes for MGET, how a user knows when to GET and when to MGET?
>>>>>>
>>>>>>
>>>>>>             
>>>>> If one wants a representation of the resource, use GET.
>>>>> If one wants a description of the resource, us MGET.
>>>>>
>>>>>
>>>>>           
>>>> This doesn't answer the question at all.  For me, a representation must
>>>> be describing something.
>>>>
>>>>         
>>> You're definition of representation seems overly narrow.
>>>
>>> If a given URI denotes a tree, and a 200 response to an HTTP GET request for
>>> that URI returns an image of the tree (i.e. a representation of the tree),
>>> does that image "describe" the tree? One may be able to observe
>>> characteristics of the tree by viewing the image, but whether or not the
>>> image is a "description" of the tree is, I think, a matter of debate, and in
>>> any case, outside the scope of the protocols in question.
>>>
>>>       
>> My answer is yes.  I don't know what is your point here.
>>     
>
> That your definition of "representation" reflects expectations not specified
> by the HTTP standard and which therefore should not be assumed to be
> reflected by any arbitrary web server response.
>   
Representation is a bytestream.  In the case of HTTP, it is an entity 
coupled with some metadata (HTTP headers) that helps a softwarer agents 
to parse the entity.  I don't know whatelse that I have expected any 
different from yours.
>>>> Hence, I cannot say if something is a
>>>> Representation but not Description.
>>>>
>>>>         
>>> It's the specification of the protocol that says what is returned (or should
>>> be).
>>>
>>> A successful response to a GET request can be presumed to be a
>>> representation.
>>>
>>> A successful response to an MGET request can be presumed to be a
>>> description.
>>>
>>>       
>> Does this help either a producer or a consumer to decide their action?
>>     
>
> To the degree that such expectations can be reliably made by software agents
> due to a high degree of standards conformance, yes.
>
>   
>>>>> There is some potential conceptual overlap between representations and
>>>>> descriptions for certain kinds of resources, but the distinction should be
>>>>> reasonably intuitive.
>>>>>
>>>>>
>>>>>           
>>>>  I don't think any protocol based on intuition is practical.
>>>>
>>>>         
>>> Neither HTTP or URIQA are based on intuition. Some concepts are, however,
>>> for most folks, fairly intuitive. But the specs will say how software should
>>> behave and expect when using those protocols.
>>>
>>>       
>> This is really an empty answer.  You still have yet given a definition
>> that helps software to behave accordingly. When should they GET and
>> MGET? Please don't circular define it, such as when you needs
>> Representation, GET, description, MGET.  I want to know the semantic
>> difference.
>>     
>
> Defined below.
>
>   
>>>>  The
>>>> concept of IR seems intuitive, but it doesn't work (at least not for me).
>>>>
>>>>         
>>>>>> PROFOUND is different because when people use it, they have already
>>>>>> known that the resources is defined by WebDAV.   Hence, these kind of
>>>>>> ideas only works when the client already have some knowledge about A.
>>>>>> But, to propose it as a general framework for the Web, it won't work.
>>>>>> At the most fundamental level, we only know three things about the Web
>>>>>> -- URI, Representation, Resource.  The concept of metadata is
>>>>>> ill-conceived at this level because as data about data, to say metadata
>>>>>> implies that we already know something about the resource we tries to
>>>>>> access, a piece of knowledge that we don't have.
>>>>>>
>>>>>>
>>>>>>             
>>>>> For URIQA, all that is needed is the URI. After all, you have to be able to
>>>>> name something to communicate effectively about it.
>>>>>
>>>>> URIQA does not presume that any representation exists. It neither posits
>>>>> nor
>>>>> requires an "Information Resource".
>>>>>
>>>>> It is perfectly complimentary to the web.
>>>>>
>>>>> GET/PUT/etc. deal with representations.
>>>>> MGET/MPUT/etc. deal with descriptions.
>>>>>
>>>>> If you have a URI, you can use it to either get representations or
>>>>> descriptions, and if you don't know anything about what resource the URI
>>>>> denotes, you might first want to get the description.
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> There are a lot of implicit assumptions under the so-called "uniform
>>>>>> access to metadata/descriptor" approach.  It either requires the
>>>>>> definition of IR or a one-on-one relationship between Resource and
>>>>>> Representation.  As the former implies that non-IR cannot have a
>>>>>> representation, it makes the "descriptor/metadata" necessary.  The knock
>>>>>> on this assumption is that the definition of IR is impossible to work
>>>>>> with.
>>>>>>
>>>>>>
>>>>>>             
>>>>> URIQA makes none of those assumptions.
>>>>>
>>>>>
>>>>>           
>>>> Really? Try to define the distinction between your terms "description"
>>>> and "representation", see what you must come out.
>>>>
>>>>         
>>> A representation is what you (should) get from a 200 response to an HTTP GET
>>> request. It can be expected to reflect, in some manner, the state of the
>>> resource denoted by the request URI. Whether the representation returned is
>>> useful or meaningful to the recipient (either human or machine), or whether
>>> it "describes" the resource in any discernable way, is outside the scope of
>>> the HTTP spec and lies entirely in the domain of information publication and
>>> consumption -- i.e the social relationship between the publisher of the
>>> representation and consumers of the representation.
>>>
>>> A description is what you (should) get from a 200 response to a URIQA MGET
>>> request. It can be expected to correspond to a graph of RDF statements,
>>> serialized in some manner (RDF/XML by default) where the particular
>>> statements of interest are those in which the request URI occurs as the
>>> subject (though there can be other statements in the graph in which the
>>> subject does not correspond to the request URI). It is intended to be
>>> interpreted by the recipient (usually a machine) in terms of the RDF model
>>> theory.
>>>
>>> Pretty distinct to me.
>>>
>>>       
>> Hence, your definition of Description is RDF.  It is fine.  Then, Conneg
>> does the same thing, doesn't it?  Wouldn't that make MGET a redundant
>> effort?
>>     
>
> Not at all.
>
> Yes, content negotiation can be (mis)used to enable a software agent to
> specifically request a formal description of a resource corresponding to an
> RDF graph, but that is not a proper use of content negotiation, per it's
> intended and established purpose, and excludes the (optimal and unambiguous)
> use of content negotiation to request alternative serializations of such
> descriptions.
>   
If you agree that can be used.  Then, it is fine.  You need to define 
precisely to say what is "mis-use" and what is not.  Making blank 
statement is not helpful. 
> One can manage to successfully drive screws into wood using a hammer, but
> that doesn't mean it's the most optimal way to go about it.
>   
Again, show a use case to demonstrate how optimal MGET is over Conneg.  
Repeating something doesn't make something true.
 
>>> HTTP GET may return a serialization of an RDF graph.
>>> URIQA MGET always returns a serialization of an RDF graph.
>>>
>>> Note that a description, returned by URIQA MGET, is a specific subtype of
>>> representation, returned by HTTP GET, and it is certainly possible for
>>> representations of that description to be accessible via HTTP GET. So yes, a
>>> representation can certainly describe a resource. But not all
>>> representations accessible via HTTP GET will be as explicitly descriptive as
>>> an RDF graph. (sorry if that is confusing, reading it several times may be
>>> necessary ;-)
>>>
>>>
>>>
>>>
>>>       
>>>>>> The 1-on-1 relationship gives rise to the so-called "legacy resource".
>>>>>> But the word "legacy resource" is wrongly named too.  In the Web, there
>>>>>> might be something as "legacy representation" but there should NOT be
>>>>>> such thing as "legacy resource" because the latter implies that the
>>>>>> Resource is closed and no more semantics will be added.
>>>>>>
>>>>>> But the so-called "metadata/descriptor" problems can be solved by using
>>>>>> HTTP Content Negotiation, making any other proposal a redundant one.
>>>>>>
>>>>>>
>>>>>>             
>>>>> Actually, it can't. As noted on http://sw.nokia.com/uriqa/URIQA.html:
>>>>>
>>>>>
>>>>>           
>>>> The link returns a 404, so I don't know if it suppose to return
>>>> something meaningful or it is a metaphor.
>>>>
>>>>         
>>> Perhaps you are including the colon at the end, which is not part of the URI
>>> (sorry). I.e. try
>>>
>>> http://sw.nokia.com/uriqa/URIQA.html
>>>
>>>
>>>       
>>>>> --
>>>>> Why not use a MIME type and content negotiation to request a description?
>>>>>
>>>>> Content negotiation is designed to allow agents to select from among a set
>>>>> of alternate encodings. The distinction between a resource description and
>>>>> (other kind of) resource representations is not based on any distinction in
>>>>> encoding.
>>>>>
>>>>>           
>>>> Nope.  That is perhaps the intention that conneg is designed.  But I
>>>> don't think that is the way it should be understood.  Content-type might
>>>> be signal a special encoding, but language, for instance, is also part
>>>> of Conneg.
>>>>
>>>>         
>>> That is true, and the wording is perhaps imperfect, but the point made is
>>> valid.
>>>
>>> Content negotiation is intended to provide access to alternative
>>> representations where the presumption is that those representations convey,
>>> as much as is possible given the limitations of their form of expression,
>>> the same essential body of information.
>>>
>>> You may wish to use content negotiation for something else, but it's
>>> original intended use, and actual use, is pretty well established.
>>> Exploiting it to do something else, is certainly possible, but not
>>> necessarily optimal as a generalized solution.
>>>
>>>
>>>       
>>>>> In fact, a given description (which is itself a resource) may have
>>>>> several available encodings (RDF/XML, XTM, N3, etc.). Thus, if you use
>>>>> content negotiation to indicate that you want a description, you can't use
>>>>> it to indicate the preferred encoding of the description (if/when other
>>>>> encodings than RDF/XML are available).
>>>>> --
>>>>>
>>>>>
>>>>>           
>>>> What is the implication of your statement. That RDF (or its sort) is
>>>> description but others are not?
>>>>
>>>>         
>>> No. I didn't mean that at all.
>>>
>>>       
>> Hmm., now I don't know your definition again.  See above how you defined it.
>>     
>
> RDF provides a level of formality for descriptions that is not inherent in
> other forms of expression.
>
> The English statement "Bob loves Mary." describes something about 'Bob' but
> a semantic web agent would have a tough time using a description expressed
> in such a manner. It wouldn't know who, or what, 'Bob' is, or if two
> occurrences of the characters 'Bob' denote the same thing, etc. In a given
> context, an intelligent human may be able to find value in such a
> description, but it's essentially worthless to a semantic web agent.
>
> In comparison, the RDF triple
>
>    { x:Bob x:loves x:Mary }
>
> (given some namespace 'x') provides much more utility to a semantic web
> agent. It can assume that x:Bob always denotes the same thing, and it can
> merge that statement with other statements, possibly obtained by using those
> URIs to retrieve additional metadata, or combining it with a locally
> maintained knowledge base, and infer things about x:Bob, x:Mary, and the
> asserted x:loves relationship between them.
>   
What does it has anything to do with MGET?  You don't have to tell me 
about RDF.  I have tell you that it is O.K. if you define description as 
RDF-based.  What I am saying is that then Conneg for RDF-type does the 
same thing as MGET.  I don't need MGET to get RDF.
>   
>>>> An HTML or XML doc definitely describes
>>>> somethings.
>>>>
>>>>         
>>> As noted above, representations may correspond to descriptions, but may not
>>> be as explicitly or formally descriptive as a serialization of an RDF graph.
>>>
>>>
>>>       
>>>> If you URIDL  them to an RDF, it doesn't change the nature
>>>> of its content.
>>>>
>>>>         
>>> One can represent a specific RDF graph in a number of different ways, and
>>> content negotiation can be effectively used as intended to request
>>> particular variant representations of that graph.
>>>
>>> If content negotiation is (mis)used to request an explicit description of a
>>> resource, then it is not available to request variant representations of
>>> that description (at least not without potentially doubling (or more) the
>>> number of MIME types).
>>>
>>>       
>> Why not?
>>     
>
> For every MIME type used to represent a particular encoding which may be
> used to serialize an RDF graph, or which may have a serialization of an RDF
> graph embedded in it in some manner, one must posit an additional
> specialized MIME type which represents the same encoding as the first MIME
> type, but additionally carries the semantics that the representation
> returned in response to the GET request should correspond to a formal
> description of a resource, suitable for semantic web agent consumption,
> rather than a "traditional" representation, intended for web agent (and
> presumably human) consumption.
>   
It is the same spirit of linked data.  If MIME-type is URIzed, then you 
can follow your nose to not only resource but MIME-types.  It makes the 
Web architecture compact and simple because the same principle applies 
to everywhere.
> It obfuscates the architecture and overloads the semantics/purpose of the
> content negotiation functionality.
>   
Can we (again) not make blank statement like such? As you said latter, 
"argument by assertion will not convince anyone". I am not sure what I 
have asserted besides the point that whatever MGET do can be done in 
CN.  Then, please, make some concrete argument to justify your claim and 
show (1) something that MGET can do but CN cannot do. or (2) if both can 
do, why MGET is good or CN is bad in practice.

Xiaoshu
>   
>>>>> Content negotiation can be used as intended in conjunction with URIQA to
>>>>> request particular variant encodings of a description.
>>>>>
>>>>>
>>>>>           
>>>> Again, the definition of "description"?
>>>>
>>>>         
>>> See above.
>>>
>>>
>>>       
>>>>>> The
>>>>>> actual issue, as I have discussed in [1], is about the incomplete syntax
>>>>>> of the URI specs, which  currently does not have a syntactic notation
>>>>>> the other two foundation objects in the Web, i.e., URI and
>>>>>> Representation.  Once we supplement URI spec with those syntactic sugar,
>>>>>> such as the one I proposed in [2], then, we can have a uniform approach
>>>>>> to (1) describe URI along with standard resources and (2) to
>>>>>> systematically discover the possible representation types, i.e.,
>>>>>> Content-Type/MIME types, associated with a Resource (either URI or
>>>>>> standard Resource). As a particular content-type is equivalent of a
>>>>>> particular *service*, hence, the approach in effect establishes a
>>>>>> uniformed approach to service discovery.
>>>>>>
>>>>>> What is required is to define Content-Type in URI.  Once we have these,
>>>>>> not only Data/Resource are linked but DataType/Service.  The best of
>>>>>> all, it works within the conceptualizations defined in AWWW, and does
>>>>>> not require any other ambiguous conceptualization, such as, IR,
>>>>>> metadata, and description, etc.
>>>>>>
>>>>>>
>>>>>>             
>>>>> I consider on of the strengths of the semantic web layer is that it is
>>>>> agnostic about the syntactic structure of URIs. I also think that
>>>>> syntactically binding the URI of a resource and the URI(s) of its
>>>>> representation(s) or description(s) is necessary, and would be overly
>>>>> cumbersome in practice.
>>>>>
>>>>>
>>>>>           
>>>> Of course.  But anyone who words with the Web should know that the Web
>>>> is consisted of these three kinds of things.
>>>>
>>>>         
>>> Anyone who is familiar with the standards which serve as the foundation for
>>> the web, and semantic web, knows what things are defined as relevant to
>>> software applications and the scope of those definitions.
>>>
>>> (granted, no spec or standard is perfect, but things are defined a lot more
>>> clearly and precisely than the definitions you seem to be assuming for these
>>> particular terms)
>>>
>>>
>>>
>>>       
>>>> Hence, giving these three
>>>> concept some syntactic sugar doesn't violate the URI's opacity
>>>> principle.
>>>>
>>>>         
>>> I'm sorry, but that statement is self-contradicting. If the URI is opaque
>>> for a given application, then syntax is irrelevant, hence there cannot be
>>> any syntactic sugar which is meaningful to that application.
>>>
>>>       
>> No.  It is not
>>     
>
> If you can't see the contradiction, then I'm not sure we can have a
> meaningful discussion about this topic.
>
>
>   
>> but let's don't sidetrack this to other issues.  What I
>> really want is a definition of "Description".  The only one that you
>> give, but later seems rescinded,
>>     
>
> ???
>
>   
>> can and should be CN.
>>     
>
> Argument by assertion will not convince anyone.
>
> Patrick
>
>
>   
>> Xiaoshu
>>     
>>> Syntax which may be relevant to the web layer is irrelevant to the semantic
>>> web layer.
>>>
>>> The interface between the web and semantic web layers is a shared set of
>>> URIs with consistent denotation, and a means for semantic web agents to
>>> interact with representations of descriptions accessible via those URIs
>>> using web protocols.
>>>
>>> The web layer is concerned with representations of resources.
>>> The semantic web layer is concerned with descriptions of resources.
>>>
>>> A description of a resource is a kind of representation of that resource,
>>> but with a formal significance to the semantic web layer, and therefore it
>>> is optimal if semantic web agents can easily access those particular
>>> representations which correspond to descriptions, or from which descriptions
>>> can be extracted, where such descriptions can be interpreted as RDF graphs
>>> according to the RDF model theory.
>>>
>>> The less bandwidth or processing needed to obtain such descriptions the
>>> better.
>>>
>>> URIQA is designed to provide the most optimal access to explicit
>>> descriptions meaningful to semantic web agents with the lowest bandwidth and
>>> processing overhead possible and the least amount of specialized knowledge
>>> (nothing more than the URI and which method to use).
>>>
>>>
>>>       
>>>> When I say syntactic sugar, I mean that it is not absolutely
>>>> necessary.  But the benefit of defining it is for convenience in practice.
>>>>
>>>>
>>>>         
>>> The sheer number of software applications which would need to be modified to
>>> consistently support such a special URI notation is staggering. URI opacity
>>> is one of the most important principles of the semantic web, for the very
>>> reason that it allows most software and content in the web layer to remain
>>> unchanged and agnostic, while enabling us to make explict statements about
>>> any resources denoted by any form of URI.
>>>
>>> Regards,
>>>
>>> Patrick
>>>
>>>
>>>
>>>       
>>>> Xiaoshu
>>>>
>>>>         
>>>>> Patrick
>>>>>
>>>>>
>>>>>
>>>>>           
>>>>>> 1. http://dfdf.inesc-id.pt/misc/man/http.html
>>>>>> 2. http://dfdf.inesc-id.pt/tr/uri-issues
>>>>>>
>>>>>> Xiaoshu
>>>>>>
>>>>>> Eran Hammer-Lahav wrote:
>>>>>>
>>>>>>
>>>>>>             
>>>>>>> Both of which are included in my analysis [1] for the discovery proposal.
>>>>>>>
>>>>>>> EHL
>>>>>>>
>>>>>>> [1] http://tools.ietf.org/html/draft-hammer-discovery-02#appendix-B.2
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>               
>>>>>>>> -----Original Message-----
>>>>>>>> From: Julian Reschke [mailto:julian.reschke@gmx.de]
>>>>>>>> Sent: Tuesday, February 24, 2009 1:45 AM
>>>>>>>> To: Patrick.Stickler@nokia.com
>>>>>>>> Cc: Eran Hammer-Lahav; jar@creativecommons.org; connolly@w3.org; www-
>>>>>>>> tag@w3.org
>>>>>>>> Subject: Re: Uniform access to metadata: XRD use case.
>>>>>>>>
>>>>>>>> Patrick.Stickler@nokia.com wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> ...
>>>>>>>>> Agents which want to deal with authoritative metadata use
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> MGET/MPUT/etc.
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>>>>> ...
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>                   
>>>>>>>> Same with PROPFIND and PROPPATCH, btw.
>>>>>>>>
>>>>>>>> BR, Julian
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>                 
>>>>>           
>>>       
>
>
>
Received on Wednesday, 25 February 2009 17:05:57 UTC