Re: URIs from Tony Hammond on 2006-06-16 (public-semweb-lifesci@w3.org from June 2006)

From: Tony Hammond <t.hammond@nature.com>
Date: Fri, 16 Jun 2006 10:00:04 +0100
To: Alan Ruttenberg <alanruttenberg@gmail.com>, <public-semweb-lifesci@w3.org>
Message-ID: <C0B831A4.D788%t.hammond@nature.com>
Hi Alan:

Just to clarify one point re INFO. You say:

>    a) The identifier is not intended to be dereferencable. In that
> case the info: scheme was suggested for the form of the uri, as that
> is explicitly not dereferenceable.

This is not actually quite true - but represents an earlier position we took
(and subsequently amended) wrt dereference. RFC 4452
(http://www.ietf.org/rfc/rfc4452.txt) actually has the following wording:

  "The "info" URI scheme exists primarily for identification purposes.
   Implementations MUST NOT assume that an "info" URI can be
   dereferenced to a representation of the resource identified by the
   URI although Namespace Authorities MAY disclose in the registration
   record references to service mechanisms pertaining to identifiers
   from the registered namespace."

We have also updated the INFO FAQ
(http://info-uri.info/registry/docs/misc/faq.html) - bar a couple of places
I still need to update - to bring this in line with the RFC.

To see an example of INFO URIs being dereferenced you might care to see this
blog post on inkdroid last month:

http://www.inkdroid.org/journal/2006/05/16/info-uris-and-opening-up-library-
data/

This shows LC numbers being linked to OCLC Linked Authority Files with
lookup for records such as info:lccn/no9910609 (Tim Berners-Lee),
info:lccn/n84156128 (Madonna), etc.

You might also consider that info:pmid/12376099 is readily derefenceable
(through a regular Entrez HTTP querystring) to

  "Wijesuriya SD, Bristow J, Miller WL. Localization and analysis
   of the principal promoter for human tenascin-X. Genomics. 2002
   Oct;80(4):443-52."

etc., etc.

Hope that makes the position with regard to INFO a litle clearer.

Cheers,

Tony




On 16/6/06 07:51, "Alan Ruttenberg" <alanruttenberg@gmail.com> wrote:16/6/06
07:51

> 
> There was an discussion a few weeks ago about URIs touch on various
> issues. This message is an attempt to untangle them, something I said
> I would write up as an action item in one of the HCLS conference
> calls. We'll be discussing URIs at the monday BioRDF conference call.
> 
> As I read the discussion I partitioned it in to three distinct issues:
> 
> 1) The relationship between the use of a URI in a representation and
> what it dereferences to, if anything. The possibilities seem to be:
> 
>    a) The identifier is not intended to be dereferencable. In that
> case the info: scheme was suggested for the form of the uri, as that
> is explicitly not dereferenceable.
> 
>    b) The URI is used primarily as a name. Insofar as we want use
> names, it is important there be some stable URIs. Of course it
> doesn't hurt if the URI becomes dereferenceable at some point, and it
> would even be nice, so let's leave open that possibility (but caveats
> in discussion below)
> 
>    c) Any URL we use needs to be able to be dereferenced to something.
> 
>    d) Any URL we use needs to be able to be dereferenced to the thing
> it is (and not dereferenced if you can't do that). It's only meaning
> is what it dereferences to.
> 
> 2) What a URI refers to. Some of this conversation was made in the
> form of a discussion about what reasonable arguments to owl:sameAs
> are - for example should one say that http://www.expasy.org/uniprot/
> P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/
> efetch.fcgi?db=protein&id=NP_000537.
> 
> Another part of the conversation talked in terms of whether the URI
> http://www.expasy.org/uniprot/P04637 should, for our purposes, refer
> to a database record or to a thing in the world - Human P53 proteins.
> 
> Of course these are two sides of the same coin - you would only say
> they the two URIs above referred to things in the world. As database
> entries, they are obviously different. There are different fields,
> they are in maintained by different people, etc.
> 
> 3) Something I will call the social aspect of URIs, for lack of a
> better term. By this I mean those aspects process we go through to
> come to a shared use of of URI. Under this category there is the
> ontology building, the strategies for connecting pieces of
> information generated by different groups. There was a bit in the
> conversations where people were arguing about whether using sameAs
> for mapping was pollution or a necessity, for instance. An important
> part of this in our context is how to define the use of URLs to
> things where there was not rigorous ontological engineering applied
> to create careful definitions, things like terminologies and entries
> in gene databases.
> 
> ---
> 
> I'll offer some of my own opinions on these issues now.
> 
> On the matter of what a URI dereferences to, I think it is more
> important to get the names in place quickly. I don't agree with the
> point of view that we should explicitly make them not
> dereferenceable, even though I'm not sure what should come back when
> we ask for what they point to yet. And I don't see support for there
> being a necessity that anything that looks like a URL have a server
> that returns something specific back. Here's a quote from RFC 3986,
> 
>> Although many URI schemes are named after protocols, this does not
>> imply that use of these URIs will result in access to the resource
>> via the named protocol.  URIs are often used simply for the sake of
>> identification.
> 
> It will part of our social process to come to some understand and
> agreement about what would be useful for us to have come back, if
> anything. Is it an RDF graph? A bunch of OWL definitions of things
> related to the gene? A representation of the asn record? A page of
> HTML? All of the above?
> 
> On the question of what kind of concept an entrez gene URI refers to,
> I think that concept needs to be "databaseRecord". There's too many
> different concepts that it could mean if we want it to refer to
> something in the world - does it refer to the sequence of the gene?
> The typical gene? All mutations of it that are found in populations?
> The possible gene products?
> 
> Rather, we can use the URI to the database entry to start to build
> concepts by defining properties and using them in OWL class
> definitions in a variety of ways. In foaf and SKOS, for instance,
> there is a property isPrimarySubjectOf. The kind of equivalence we
> can have between http://www.expasy.org/uniprot/P04637 and http://
> eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
> db=protein&id=NP_000537 is something like: The same something
> isPrimarySubjectof http://www.expasy.org/uniprot/P04637 and  http://
> eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
> db=protein&id=NP_000537.
> where "something" is a blank node in RDF.  Or in OWL
> 
> Class(P53Gene complete
>      restriction(isPrimarySubjectof
>                    (value <http://eutils.ncbi.nlm.nih.gov/entrez/
> eutils/efetch.fcgi?db=protein&id=NP_000537>)))
> 
> Class(P53Transcript partial intersectionOf(mRNA restriction
> (derivesFrom someValuesFrom(P53Gene))))
> 
> Which says that it is necessary and sufficient for x to be a
> P53Gene,for example, if someone
> has stated or it has been inferred that
> 
> Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/
> P04637>))
> 
> and that a P53 transcript, among other things,  is a mRNA that
> derivesFrom some P53Gene.
> 
> (there will be more complicated definitions too :)
> 
> [sameAs, equivalentClass, equivalentProperty will be a necessity, I
> think, BTW]
> 
> As for the social process, I look forward to the discussion on Monday :)
> 
> Regards,
> Alan
> 
> 
> http://www.w3.org/TR/uri-clarification/
> Uniform Resource Identifier (URI): Generic Syntax - http://
> tools.ietf.org/html/3986
> Relations in biomedical ontologies - http://genomebiology.com/
> 2005/6/5/R46
> http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
> http://en.wikipedia.org/wiki/URL

********************************************************************************   
DISCLAIMER: This e-mail is confidential and should not be used by anyone who is
not the original intended recipient. If you have received this e-mail in error
please inform the sender and delete it from your mailbox or any other storage
mechanism. Neither Macmillan Publishers Limited nor any of its agents accept
liability for any statements made which are clearly the sender's own and not
expressly made on behalf of Macmillan Publishers Limited or one of its agents.
Please note that neither Macmillan Publishers Limited nor any of its agents
accept any responsibility for viruses that may be contained in this e-mail or
its attachments and it is your responsibility to scan the e-mail and 
attachments (if any). No contracts may be concluded on behalf of Macmillan 
Publishers Limited or its agents by means of e-mail communication. Macmillan 
Publishers Limited Registered in England and Wales with registered number 785998 
Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS   
********************************************************************************
Received on Friday, 16 June 2006 09:00:15 UTC