W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > July 2006

Re: Re: [BioRDF] All about the LSID URI/URN

From: <samwald@gmx.at>
Date: Mon, 31 Jul 2006 13:07:23 +0200
Message-ID: <20060731110723.265970@gmx.net>
To: public-semweb-lifesci@w3.org

The recent discussion about URIs has made me change my perspective on the whole issue. It seems that it is getting much more complicated than it needs to be. Here is my new suggestion:

1.) Every agent/user on the Semantic Web connects to one or more RDF query endpoints. The HTTP addresses of these endpoints would probably be few, in the order of magnitude of a few dozens at most. The small collection of addresses of these endpoints would be easily manageable and probably quite stable.

2.) The agents conduct distributed queries over all endpoints, using SPARQL or the simpler FETCH or SPO query protocols [1].

3.) URIs that represent entities in RDF/OWL are completely separated from URLs that can be resolved via HTTP. While the URIs that represent entities have well-defined semantics, the resolvable URLs are just that: URLs that yield some binary data through a HTTP-GET command. We should not ascribe any meaning or identity criteria to such a URL -- it is simply a string that we can type into our web browser to get something back.

4.) URIs of an entity can be connected to URLs through a single property, let's call it 'get-at'. A typical use of this property would be

<some-picture> <get-at> <some-URL>

The only connection of <some-URL> to the rest of the RDF graph is the <get-at> property; it is not part of any other statement. All descriptions pertaining to the digital resource (e.g. metadata about the size of the picture) are made with <some-picture>.

Ultimately, we would have three fundamentally different kinds of URIs and statements: a URI for the thing itself, a URI for a digital representation of a thing (e.g. a picture) and a URI/URL that simply gives us a HTTP address for the digital representation. Here is an example:

Statement about a thing:

<Eiffel-Tower> <rdf:type> <Tower>

Statements about a digital resource dealing with a thing:

<Eiffel-Tower> <depicted-in> <Eiffel-Tower-Picture>
<Eiffel-Tower-Picture> <dimensions> "800x600 pixels"

Statement about the URL of a digital resource dealing with a thing:

<Eiffel-Tower-Picture> <get-at> <http://www.sbac.edu/~tpl/clipart/Photos/Eiffel%20Tower.jpg>

Note that the actual URIs of <Eiffel-Tower> or <Eiffel-Tower-Picture> might also be URIs of the http naming scheme -- no one would ever try to resolve them, though. Only URIs that are the object of a <subject> <get-at> <object> triple would ever be resolved via HTTP.

If the URLs break or are subject to change, we would need to update the RDF graph describing the URLs. This can be done by any of the administrators of the RDF query endpoints. In this scenario, the RDF query endpoints and the distributed SPARQL queries have made any other resolution system unnecessary. No need for GETting every resource we encounter, no need for content negotiation, no need for downloading huge RDF files just to get a few statements out of them, no need to pay attention of the URIs you use when creating a simple ontology, no need to be puzzled about the semantics of simple web resources and URLs, no need to express versioning information in the URI... 

Everything stays inside RDF and already standardized RDF technologies. Of course, distributed queries would still need some optimization, but that is a problem we would have to deal with anyways.

Kind regards,
Matthias Samwald

[1] http://www.wiwiss.fu-berlin.de/suhl/bizer/rdfapi/tutorial/netapi.html


Echte DSL-Flatrate dauerhaft für 0,- Euro*. Nur noch kurze Zeit!
"Feel free" mit GMX DSL: http://www.gmx.net/de/go/dsl
Received on Monday, 31 July 2006 11:07:37 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:00:44 GMT