Re: RE: [BioRDF] URI Resolution from Jonathan Rees on 2007-02-09 (public-semweb-lifesci@w3.org from February 2007)

From: Jonathan Rees <jonathan.rees@gmail.com>
Date: Fri, 9 Feb 2007 12:25:13 -0500
To: "Booth, David (HP Software - Boston)" <dbooth@hp.com>
Cc: samwald@gmx.at, public-semweb-lifesci@w3.org
Message-ID: <3cff5e070702090925l6b41325dp1986bc9c9675123a@mail.gmail.com>
Doing HTTP operations on an information resource, while abstractly
similar to answering SPARQL queries relating to it (in either case you
are learning something), seems to have a different feel given present
technology. The protocol used is HTTP and the stuff you get has types
(e.g. image, PDF) that are not best represented in RDF. Even if you do
a GET and receive RDF, the information you receive may or may not be
trustworthy and may or may not answer the questions you have - in fact
a common case (e.g. foaf:) is that the resource's RDF content is about
*other* resources, not the resource itself.

Another way to say it: HTTP GET can only answer one question about the
resource. SPARQL (or other query language) can answer an open-ended
set of questions. The GET problem seems pressing and almost tractable,
and we have a lot of experience with it. Finding SPARQL endpoints is
novel, everyone's using ad hoc solutions, and the need for shared
solutions is not so pressing.

Be careful about the word "authoritative" - I know Tim B-L likes the
word, but authority is earned, not assumed; just because a host says
something about one of its resources doesn't mean what's said is true.
E.g., a server could easily be mistaken about authorship or licensing
terms for a document, and semantic-web phishing scenarios are pretty
easy to concoct, especially as more and more companies go bankrupt and
lose their domain registrations.

The issue is not primary vs. third party, but whether you should
believe what someone tells you. That's an orthogonal question.

On 2/9/07, Booth, David (HP Software - Boston) <dbooth@hp.com> wrote:
>
> > From: samwald@gmx.at
> >
> > >> In your view, how would one find
> > >> information (or represent the information needed to find
> > >> information) about a non-informationresource
> >
> > I think parallel querying of Sparql endpoints could be an
> > interesting solution. . . . .
>
> I'm curious why you are treating this case so differently from the case
> of finding information about an information resource.  I assume it is
> because with information resources you are only interested in
> information from that information resource and not from third parties.
> Is that correct?
>
> >
> > > I'm not sure what you are asking. [...] you
> > > should be able to follow-your-nose by deferencing the namespace.
> >
> > This would probably only lead to the information that was
> > available when the URI for the gene was minted. The more
> > interesting information will probably be found elsewhere,
> > e.g. in an interactome database that describes interactions
> > of the gene that was previously described in the genome
> > database. How should these be found with the follow-your-nose
> > approach when there are no 'trackback' relations in the
> > genome database?
>
> Ah!  Thanks for the clarification.  That is a very different use case
> than the case where resources have simply moved or you wish to use a
> different protocol than HTTP for retrieving resources.  I suggest we
> distinguish between two kinds of information about resources:
>
>  - Primary Authoritative Information: information that is provided by
> the resource owner.  In this case, the big issue is how to establish and
> maintain efficient ways to retrieve such information while supprting the
> WebArch follow-your-nose principle of having a clear, deterministic
> chain of authority from the URI to the information.  Incidentally, when
> we speak of a resource having "moved" we mean that primary authoritive
> information now resides at a different location from the one originally
> associated with the URI.
>
>  - Third Party Information: information that is provided by parties
> other than the resource owner or the information requestor.  In this
> case the issues seem to me to be trust (how do you decide which
> information to trust?) and the lack of a deterministic way to find such
> information (it could be anywhere).
>
> I think the problems involved in these two cases are very different and
> a solution that addresses the problem of retrieving Primary
> Authoritative Information is much simpler than the problem of finding
> and deciding to trust Third Party Information.    Of course, once Third
> Party Information is found, and there is a decision to use it, then the
> problem becomes the same as that for retrieving Primary Authoritative
> Information.
>
> I had previously thought that the proposed effort was only intended to
> address the problem of retrieving Primary Authoritative Information.
> Are you proposing that it should address both of these problems?  If so,
> then I think the part that addresses only the need to retrieve Primary
> Authoritative Information should be cleanly separable from the rest, so
> that its implementation does not require all of the machinery needed for
> addressing the (much larger) Third Party Information problem.  Are there
> reasons why you think these problems should not be separated?
>
> David Booth
>
>
Received on Friday, 9 February 2007 17:25:19 UTC