Re: URIs from Alan Ruttenberg on 2006-06-16 (public-semweb-lifesci@w3.org from June 2006)

From: Alan Ruttenberg <alanruttenberg@gmail.com>
Date: Fri, 16 Jun 2006 10:41:12 -0400
To: Tony Hammond <t.hammond@nature.com>
Cc: <public-semweb-lifesci@w3.org>
Message-Id: <d570d3f9e7851e0cd8d05645aa8d8749@gmail.com>
Hi Tony,

Thanks for the clarification. I took a closer look at the spec and have  
come comments:

  - Normalization. Upon review of rfc 2396, prompted by reading the  
normalization rules, it strikes me that all of these are problematic  
because they require scheme dependent logic for comparing URIs. info  
has some basic rules for normalization, but then allows registered  
public namespaces to add additional ones. Those normalization rules are  
not encoded in a machine usable manner, meaning that applications that  
are to use info uris reliably must, in order to be correct, have a  
difficult task that requires custom coding. You might consider amending  
the spec to remove this ability, or at least require that the  
normalization rules are in a form that can automatically be translated  
into an algorithm for comparison.

- Discovery. It wasn't clear to me how one obtains a machine readable  
form of the registry information, for example to extract the service  
mechanism in an automated way. I was able to find  
http://errol.oclc.org/info-uri.info/info:arxiv/.reg by clicking around,  
but I don't see that specified. You might consider formalizing how one  
obtains the machine readable service record, document what it's  
contents are and supply an rdf or owl description of the resource for  
SW usage.

-  I'm not sure I'm convinced about all the rationale for the info  
scheme, particularly given the quote from rfc 3986, and the addition of  
the potential for network services for info services. The point about  
persistence is interesting, though we probably need to think about the  
implications of that for our applications. Adverse effects of the  
standard normalization are definitely worth considering. I think, in  
the end, if, as part of the specification, one could reliably retrieve  
SW understandable information about the namespace, that would be an  
advance. I'm not knowledgeable enough to know if I am missing  
something, but as far as I can see, the only authority related to  
namespaces in URLs is the DNS, and while there is the SRV field which  
might be used to direct someone to information about the namespace, I  
don't know whether anyone does.

Regards,
Alan

On Jun 16, 2006, at 5:00 AM, Tony Hammond wrote:

> Hi Alan:
>
> Just to clarify one point re INFO. You say:
>
>>    a) The identifier is not intended to be dereferencable. In that
>> case the info: scheme was suggested for the form of the uri, as that
>> is explicitly not dereferenceable.
>
> This is not actually quite true - but represents an earlier position  
> we took
> (and subsequently amended) wrt dereference. RFC 4452
> (http://www.ietf.org/rfc/rfc4452.txt) actually has the following  
> wording:
>
>   "The "info" URI scheme exists primarily for identification purposes.
>    Implementations MUST NOT assume that an "info" URI can be
>    dereferenced to a representation of the resource identified by the
>    URI although Namespace Authorities MAY disclose in the registration
>    record references to service mechanisms pertaining to identifiers
>    from the registered namespace."
>
> We have also updated the INFO FAQ
> (http://info-uri.info/registry/docs/misc/faq.html) - bar a couple of  
> places
> I still need to update - to bring this in line with the RFC.
>
> To see an example of INFO URIs being dereferenced you might care to  
> see this
> blog post on inkdroid last month:
>
> http://www.inkdroid.org/journal/2006/05/16/info-uris-and-opening-up- 
> library-
> data/
>
> This shows LC numbers being linked to OCLC Linked Authority Files with
> lookup for records such as info:lccn/no9910609 (Tim Berners-Lee),
> info:lccn/n84156128 (Madonna), etc.
>
> You might also consider that info:pmid/12376099 is readily  
> derefenceable
> (through a regular Entrez HTTP querystring) to
>
>   "Wijesuriya SD, Bristow J, Miller WL. Localization and analysis
>    of the principal promoter for human tenascin-X. Genomics. 2002
>    Oct;80(4):443-52."
>
> etc., etc.
>
> Hope that makes the position with regard to INFO a litle clearer.
>
> Cheers,
>
> Tony
>
>
>
>
> On 16/6/06 07:51, "Alan Ruttenberg" <alanruttenberg@gmail.com>  
> wrote:16/6/06
> 07:51
>
>>
>> There was an discussion a few weeks ago about URIs touch on various
>> issues. This message is an attempt to untangle them, something I said
>> I would write up as an action item in one of the HCLS conference
>> calls. We'll be discussing URIs at the monday BioRDF conference call.
>>
>> As I read the discussion I partitioned it in to three distinct issues:
>>
>> 1) The relationship between the use of a URI in a representation and
>> what it dereferences to, if anything. The possibilities seem to be:
>>
>>    a) The identifier is not intended to be dereferencable. In that
>> case the info: scheme was suggested for the form of the uri, as that
>> is explicitly not dereferenceable.
>>
>>    b) The URI is used primarily as a name. Insofar as we want use
>> names, it is important there be some stable URIs. Of course it
>> doesn't hurt if the URI becomes dereferenceable at some point, and it
>> would even be nice, so let's leave open that possibility (but caveats
>> in discussion below)
>>
>>    c) Any URL we use needs to be able to be dereferenced to something.
>>
>>    d) Any URL we use needs to be able to be dereferenced to the thing
>> it is (and not dereferenced if you can't do that). It's only meaning
>> is what it dereferences to.
>>
>> 2) What a URI refers to. Some of this conversation was made in the
>> form of a discussion about what reasonable arguments to owl:sameAs
>> are - for example should one say that http://www.expasy.org/uniprot/
>> P04637 is the sameAs http://eutils.ncbi.nlm.nih.gov/entrez/eutils/
>> efetch.fcgi?db=protein&id=NP_000537.
>>
>> Another part of the conversation talked in terms of whether the URI
>> http://www.expasy.org/uniprot/P04637 should, for our purposes, refer
>> to a database record or to a thing in the world - Human P53 proteins.
>>
>> Of course these are two sides of the same coin - you would only say
>> they the two URIs above referred to things in the world. As database
>> entries, they are obviously different. There are different fields,
>> they are in maintained by different people, etc.
>>
>> 3) Something I will call the social aspect of URIs, for lack of a
>> better term. By this I mean those aspects process we go through to
>> come to a shared use of of URI. Under this category there is the
>> ontology building, the strategies for connecting pieces of
>> information generated by different groups. There was a bit in the
>> conversations where people were arguing about whether using sameAs
>> for mapping was pollution or a necessity, for instance. An important
>> part of this in our context is how to define the use of URLs to
>> things where there was not rigorous ontological engineering applied
>> to create careful definitions, things like terminologies and entries
>> in gene databases.
>>
>> ---
>>
>> I'll offer some of my own opinions on these issues now.
>>
>> On the matter of what a URI dereferences to, I think it is more
>> important to get the names in place quickly. I don't agree with the
>> point of view that we should explicitly make them not
>> dereferenceable, even though I'm not sure what should come back when
>> we ask for what they point to yet. And I don't see support for there
>> being a necessity that anything that looks like a URL have a server
>> that returns something specific back. Here's a quote from RFC 3986,
>>
>>> Although many URI schemes are named after protocols, this does not
>>> imply that use of these URIs will result in access to the resource
>>> via the named protocol.  URIs are often used simply for the sake of
>>> identification.
>>
>> It will part of our social process to come to some understand and
>> agreement about what would be useful for us to have come back, if
>> anything. Is it an RDF graph? A bunch of OWL definitions of things
>> related to the gene? A representation of the asn record? A page of
>> HTML? All of the above?
>>
>> On the question of what kind of concept an entrez gene URI refers to,
>> I think that concept needs to be "databaseRecord". There's too many
>> different concepts that it could mean if we want it to refer to
>> something in the world - does it refer to the sequence of the gene?
>> The typical gene? All mutations of it that are found in populations?
>> The possible gene products?
>>
>> Rather, we can use the URI to the database entry to start to build
>> concepts by defining properties and using them in OWL class
>> definitions in a variety of ways. In foaf and SKOS, for instance,
>> there is a property isPrimarySubjectOf. The kind of equivalence we
>> can have between http://www.expasy.org/uniprot/P04637 and http://
>> eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
>> db=protein&id=NP_000537 is something like: The same something
>> isPrimarySubjectof http://www.expasy.org/uniprot/P04637 and  http://
>> eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?
>> db=protein&id=NP_000537.
>> where "something" is a blank node in RDF.  Or in OWL
>>
>> Class(P53Gene complete
>>      restriction(isPrimarySubjectof
>>                    (value <http://eutils.ncbi.nlm.nih.gov/entrez/
>> eutils/efetch.fcgi?db=protein&id=NP_000537>)))
>>
>> Class(P53Transcript partial intersectionOf(mRNA restriction
>> (derivesFrom someValuesFrom(P53Gene))))
>>
>> Which says that it is necessary and sufficient for x to be a
>> P53Gene,for example, if someone
>> has stated or it has been inferred that
>>
>> Individual(x value(isPrimarySubjectOf <http://www.expasy.org/uniprot/
>> P04637>))
>>
>> and that a P53 transcript, among other things,  is a mRNA that
>> derivesFrom some P53Gene.
>>
>> (there will be more complicated definitions too :)
>>
>> [sameAs, equivalentClass, equivalentProperty will be a necessity, I
>> think, BTW]
>>
>> As for the social process, I look forward to the discussion on Monday  
>> :)
>>
>> Regards,
>> Alan
>>
>>
>> http://www.w3.org/TR/uri-clarification/
>> Uniform Resource Identifier (URI): Generic Syntax - http://
>> tools.ietf.org/html/3986
>> Relations in biomedical ontologies - http://genomebiology.com/
>> 2005/6/5/R46
>> http://en.wikipedia.org/wiki/Uniform_Resource_Identifier
>> http://en.wikipedia.org/wiki/URL
>
> *********************************************************************** 
> *********
> DISCLAIMER: This e-mail is confidential and should not be used by  
> anyone who is
> not the original intended recipient. If you have received this e-mail  
> in error
> please inform the sender and delete it from your mailbox or any other  
> storage
> mechanism. Neither Macmillan Publishers Limited nor any of its agents  
> accept
> liability for any statements made which are clearly the sender's own  
> and not
> expressly made on behalf of Macmillan Publishers Limited or one of its  
> agents.
> Please note that neither Macmillan Publishers Limited nor any of its  
> agents
> accept any responsibility for viruses that may be contained in this  
> e-mail or
> its attachments and it is your responsibility to scan the e-mail and
> attachments (if any). No contracts may be concluded on behalf of  
> Macmillan
> Publishers Limited or its agents by means of e-mail communication.  
> Macmillan
> Publishers Limited Registered in England and Wales with registered  
> number 785998
> Registered Office Brunel Road, Houndmills, Basingstoke RG21 6XS
> *********************************************************************** 
> *********
Received on Friday, 16 June 2006 14:41:57 UTC