RE: RDF for molecules, using InChI from Michel_Dumontier on 2007-08-03 (public-semweb-lifesci@w3.org from August 2007)

From: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Date: Fri, 03 Aug 2007 10:14:54 -0400
To: Egon Willighagen <egon.willighagen@gmail.com>, public-semweb-lifesci@w3.org
Message-id: <AB349814F1ECB143A5D4CD29C7A645690192D751@CCSEXB10.CUNET.CARLETON.CA>
> info:inchi/InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H
> 
> The owl:sameAs can make the link of this URI to the one I suggested.
> 
> Egon

Egon,
 Excellent! This is exactly what I'm looking for - in addition, the info
registry [1] contains several other namespaces such as pmids and refseq
identifiers.

[1]
http://info-uri.info/registry/OAIHandler?verb=ListRecords&metadataPrefix
=oai_dc


Also, take note of their excellent FAQ which addresses the rationale for
using the INFO URI (http://info-uri.info/registry/docs/misc/faq.html)

<<<
#   Why was it necessary to develop the INFO URI scheme?  <<

The INFO URI scheme was developed from within the library and publishing
communities to expedite the referencing by URIs of information assets
that have identifiers in public namespaces but have no representation
within the URI allocation.

For various reasons (both cultural and technical) the creation and
registration of a new URI scheme or URN namespace to support a given
public namespace under the URI allocation may not have been attempted by
the authority for that namespace. It is precisely to facilitate the
representation of these public namespaces within the URI allocation that
the INFO URI scheme was developed.
# What was the motivation behind the INFO URI scheme?  <<

The motivation behind developing the INFO URI scheme was to allow legacy
identification systems to become part of the World Wide Web global
information architecture so that the information assets they identify
can be referenced by Web-based description technologies such as XLink,
RDF or Topic Maps. Note that we are concerned with "information assets",
not "digital assets" per se - the information assets may be variously
digital, physical or conceptual.
>>>

<<<
#   Why not just use HTTP URIs?  <<

HTTP URIs (RFC 2616) are Internet protocol elements for referencing
hypertext documents which can be retrieved from a network authority
using the HTTP transfer protocol. There is a common expectation that
HTTP URIs can be dereferenced.

The following considerations hold in respect of HTTP URIs:

    * HTTP URIs are inappropriate for INFO namespaces because HTTP URIs
provide:
          o network transport
          o network path (discovery obstacle)
          o strong dereference expectation
          o poor branding (network path overhead)
    * A transport mechanism adds meaningless semantic overhead to
nondereferencable URIs.
    * Absolute HTTP URIs include a network path (comprised of an
authority component and a hierarchical path component). INFO namespaces
may not have (or may not make) any network authority available. A
central network authority would also be inappropriate as this would
introduce a dependency between a third party namespace and a central
network authority.

      Further, were INFO namespaces to make a network authority
available they would each have to publish the particular hierarchical
path syntax employed by that network authority. A central network
authority would mitigate this requirement by providing a single path
syntax, although it would still need to publish that path syntax.
    * Use of HTTP URIs might only encourage the provisioning of resource
representations (e.g. metadata descriptions) which could conflict with
representations provided under any possible future URI registration on
the part of the Namespace Authority. Further, if HTTP URIs were used to
provide resource representations, it must be recognized that managing
the namespace and infrastructure is a costly enterprise that may not be
appropriate or cost effective in a given business context.
    * The network path of HTTP URIs adds unnecessary string overhead and
consequent loss of branding for legacy identifiers.

#

    *

# Well then, why not just use URN URIs?  <<

URN URIs (RFC 2141) are Internet protocol elements for referencing
resources using persistent and location-independent identifiers,
representations of which may be retrieved using various resolution
mechanisms. There is a common expectation that URN URIs can be
dereferenced, once suitable resolution mechanisms are defined (e.g. DDDS
or other proprietary mechanisms). Indeed, RFC 1737 goes so far as to
make a strong recommendation that "there be a mapping between the names
generated by each naming authority and URLs".

Use of URN URIs requires a URN namespace registration. An informal URN
namespace is of limited utility because its numerical nature obliterates
any branding or name recognition and effectively renders the namespace
anonymous. A formal URN namespace, on the other hand, would require a
more substantial review than a corresponding registration under the INFO
Registry. Based on experience with the initial INFO namespace target
group, it is unlikely that many Namespace Authorities will proceed with
independent applications as the burden of registering a URN namespace is
high, especially in the case of organizations that are not strongly
steeped in technology.

One particular impediment in applying for a URN namespace for INFO is
that this would compromise any possible future URN namespace
registration that a Namespace Authority might seek to make in respect of
considerations of persistence, location independence and/or dereference
to resource representations.

The following considerations hold in respect of URN URIs:

    * URN URIs are inappropriate for INFO namespaces because URN URIs
provide:
          o claims of persistence of resource identifiers
          o dereference expectation
          o no delegated naming responsibility
          o restricted syntax (no hierarchy)
          o no support for fragment identifiers
          o poor branding and extra semantic layer (additional namespace
tier)
    * INFO URIs make no claims on persistence. INFO URIs may be location
independent and in consequence may enjoy some degree of persistence, but
INFO does not make these assertions. Instead INFO is neutral with
respect to identifier persistence.
    * Use of URN URIs might only encourage the provisioning of resource
representations (e.g. metadata descriptions) which could conflict with
representations provided under any possible future URI registration on
the part of the Namespace Authority. Further, if URN URIs were used to
provide resource representations, it must be recognized that managing
the namespace and infrastructure is a costly enterprise that may not be
appropriate or cost effective in a given business context.
    * For INFO to operate as a URN namespace would require that INFO be
constituted as a delegated naming authority. It is not clear that a URN
namespace would be an appropriate choice for such naming authority
delegation.
    * Syntactically, URN URIs do not support hierarchy (in URI syntax
hierarchy proceeds through the "/" character) and are thus more
difficult to use with legacy identifiers because of their restricted
character set. Other characters reserved by URN URIs, but allowed by
INFO URIs are "&" and "~".

      For a demonstration in the difficulty of mapping legacy
identifiers the reader is referred to RFC 3151 which provides a set of
complex transcriptions for mapping SGML formal public identifiers onto
the URN URI syntax. Formal public identifiers would have been more
readily presented under the more expressive INFO syntax.
    * Additionally, URN URIs do not support fragment identifiers thus
not allowing the identification of secondary resources with respect to a
primary resource. This is a pratical requirement that INFO supports.
    * With INFO as a URN namespace, the INFO namespaces would then
become sub-sub-namespaces, with a consequent loss of branding. This
would also introduce three tiers of semantic layers for an
implementation to navigate.

>>>


I think this is really interesting, and well worth further investigating
its merits for knowledge communities. 


-=Michel=-
 
Michel Dumontier
Assistant Professor of Bioinformatics
 
Department of Biology, School of Computer Science, Institute of
Biochemistry 
Carleton University 

Member of the Ottawa Institute of Systems Biology 
Member of the Ottawa-Carleton Institute for Biomedical Engineering
 
Office: 4610 Carleton Technology and Training Center
Mailing: 209 Nesbitt, 1125 Colonel By Drive, Ottawa, ON K1S5B6
Tel:  +1 (613) 520-2600 x4194
Fax:  +1 (613) 520-3539
Web:  http://dumontierlab.com
Skype: micheldumontier

> -----Original Message-----
> From: Egon Willighagen [mailto:egon.willighagen@gmail.com]
> Sent: Friday, August 03, 2007 6:15 AM
> To: Michel_Dumontier
> Subject: Re: RDF for molecules, using InChI
> 
> Michel,
> 
> On 8/2/07, Michel_Dumontier <Michel_Dumontier@carleton.ca> wrote:
> > I support the use of InChI as URI. Of course, the use of such a URI
will
> > annoy those that want URL resolvable URIs... another reason to
relate the
> URI
> > and the resolvable URL with an owl:sameAs predicate.
> 
> FYI, I was just informed by Tony Hammond about this blog post:
> 
> http://www.crossref.org/CrossTech/2007/02/at_last_uris_for_inchi.html
> 
> in which this is suggested:
> 
> info:inchi/InChI=1/C10H8/c1-2-6-10-8-4-3-7-9(10)5-1/h1-8H
> 
> The owl:sameAs can make the link of this URI to the one I suggested.
> 
> Egon
Received on Friday, 3 August 2007 14:15:10 UTC