RE: BioRDF: URL Statements from Booth, David (HP Software - Boston) on 2006-10-02 (public-semweb-lifesci@w3.org from October 2006)

From: Booth, David (HP Software - Boston) <dbooth@hp.com>
Date: Mon, 2 Oct 2006 16:47:17 -0400
To: "Matthias Samwald" <samwald@gmx.at>, <public-semweb-lifesci@w3.org>
Message-ID: <EBBD956B8A9002479B0C9CE9FE14A6C20141707B@tayexc19.americas.cpqcorp.net>

> From:  Matthias Samwald
> . . .
> 3) What current proposals about the 'resolution' of URIs do 
> is trying to force four different things into a single URI: 

I don't know which proposals you mean, but forcing these into the same
URI is definitely not what the TAG recommends in the WebArch[1], even
when an http URI is used, and certainly not what I would advocate.

When an http URI is used to identify something other than an
"information resource", the same URI should NOT be used as both the
identifier and the locator.  There are two styles of http URIs available
for identifying things that are not "information resources": (a) hash
URIs; or (b) slash URIs with 303 redirects.  

If a hash URI is used, then

	http://example.org/foo#bar might identify a thing, and

	http://example.org/foo can be used to seek metadata about it.

These are different (though related) URIs.  

If a slash URI is used, the situation is more flexible:

	http://lsid.example/myorg.example/foo might identify a thing, 
	and, if this is dereferenced, the request should forward 
	(using 303 "See Other" response code) to another URI, such as

	http://myorg.example/foo , which might be used to seek metadata 
	about that thing.

(In this example I made the right-hand parts of these two URIs be the
same for convenience, however there is no requirement that they be the
same.)

Note also that if the thing being identified is a large chunk of data,
that data may well be accessed by yet another (or several other) URI(s),
which might even be indicated in the metadata.

In summary:
	- one URI identifies the thing;
	- a related URI provides metadata; and
	- other URIs are used to retrieve the actual data.

> a. the symbol for a thing, 
> b. the symbol for an information  resource (i.e. a certain 
> ordering of bytes, for example a JPEG picture or an HTML document) and
> c. a string (i.e. a URI) that can be used in conjunction with 
> some resolution mechanism in order to yield the information resource
> 
> 4) Trying to lump ontologically different things into one 
> symbol is bad practice, and leads to a lot of confusion. 

Yes.  However, it is *good* practice to make these pairs of URIs be
*related* to each other, such that a user who knows one *may* be able to
determine the other.  In particular, so that if you come across a
previously unknown URI that identifies a thing, you have a well-known,
deterministic way to look for useful information about that thing.

> This 
> confusion can be avoided by clearly distinguishing  a, b and 
> c in our RDF graph.
> 
> 5) Finding additional RDF statements about a given resource 
> has not much to do with 'resolution', would more accurately 
> be described as making a query. These two things should not 
> be mixed up. SPARQL endpoints are probably the best solution by far.

David Booth

Received on Monday, 2 October 2006 20:47:31 UTC