RE: RE: [BioRDF] URI Resolution from Booth, David (HP Software - Boston) on 2007-02-12 (public-semweb-lifesci@w3.org from February 2007)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 12 Feb 2007 02:10:52 -0500
To: "Jonathan Rees" <jonathan.rees@gmail.com>
Cc: <samwald@gmx.at>, <public-semweb-lifesci@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C2020B71C5@tayexc19.americas.cpqcorp.net>
> From: Jonathan Rees [mailto:jonathan.rees@gmail.com] 
> 
> Doing HTTP operations on an information resource, while abstractly
> similar to answering SPARQL queries relating to it (in either case you
> are learning something), seems to have a different feel given present
> technology. The protocol used is HTTP and the stuff you get has types
> (e.g. image, PDF) that are not best represented in RDF. Even if you do
> a GET and receive RDF, the information you receive may or may not be
> trustworthy and may or may not answer the questions you have - in fact
> a common case (e.g. foaf:) is that the resource's RDF content is about
> *other* resources, not the resource itself.
> 
> Another way to say it: HTTP GET can only answer one question about the
> resource. SPARQL (or other query language) can answer an open-ended
> set of questions. 

Of course a GET can be parameterized with a query string, too, but
perhaps a SPARQL query may be more better solution depending on the use
case.

> The GET problem seems pressing and almost tractable,
> and we have a lot of experience with it. Finding SPARQL endpoints is
> novel, everyone's using ad hoc solutions, and the need for shared
> solutions is not so pressing.

I am confused about the use cases that you are attempting to address.
In reading the URI Resolution problem statement[1] the problem that is
described sounds to me like what you are calling the GET problem -- not
one that would require SPARQL endpoints.  It sounds like you are trying
to address a much broader set of use cases than is clear from the
document.

> 
> Be careful about the word "authoritative" - I know Tim B-L likes the
> word, but authority is earned, not assumed; just because a host says
> something about one of its resources doesn't mean what's said is true.
> E.g., a server could easily be mistaken about authorship or licensing
> terms for a document, and semantic-web phishing scenarios are pretty
> easy to concoct, especially as more and more companies go bankrupt and
> lose their domain registrations.

That isn't quite what I meant by "authoritative".  I meant it in the
sense that (I think) the WebArch[2] uses the term.  You are quite right
that "authoritative" information served by a URI owner is not
necessarily true, but it *is* authoritative in that the owner of the URI
has the *authority* (i.e., is authorized by the WebArch and social
convention -- see [2] ) to serve information that defines what resource
the URI identifies.

> 
> The issue is not primary vs. third party, but whether you should
> believe what someone tells you. That's an orthogonal question.

Well, yes and no.  Trust is an important issue, and it may be important
to be able to load some data from a source while excluding other data
from that same source.  SPARQL may be helpful for that.  Is this what
you were suggesting?  This seems like a very different use case from
what I understood by reading the URI Resolution Problem Statement[1].
Is this what you had in mind in [1]?

OTOH, the issue of being able to efficiently retrieve primary
information about a URI from the owner of its associated resource is
very different from the problem of locating third-party information,
even though the question of trust can apply to both cases.

If the issue of trust is the primary problem that you are attempting to
address, then I can understand lumping these cases together.  But it
seemed to me that the primary problem was much more around the mechanics
and conventions for efficiently retrieving primary information, as
suggested in the example given in the URI Resolution problem
statement[1]:
[[
    An RDF file is composed using URL's that all resolve nicely.  When
    years later someone tries to use the file, some of these same
    URL's are broken due to acquisitions, web site reorganizations,
    and changes of administration.  All the linked resources are
    available, just under different URL's.  How to make the user's
    application work without having to rewrite the RDF?
]]

When I initially read the URI Resolution problem statement[1] I
*thought* I had a reasonably clear understanding of the problem it is
trying to address, but as the discussion has unfolded I am starting to
think that my reading is significantly different than what the authors
may have intended, and I'm not yet clear on what the intent is.  Would
it be possible to provide use cases that better clarify the intent?  I,
for one, would find that very helpful, and I wonder if some of the
disagreement between Alan and Xiaoshu may also be rooted in different
people making different assumptions about the problem you are trying to
address.

Thanks,
David Booth

1. URI Resolution problem statement:
http://esw.w3.org/topic/HCLSIG_BioRDF_Subgroup/Documents?action=AttachFi
le&do=get&target=getting-information.txt
or http://tinyurl.com/3a472e

2. WebArch on resource ownership:
http://www.w3.org/TR/webarch/#uri-assignment
Received on Monday, 12 February 2007 07:11:05 UTC