Re: Identifiers (was Notes from today's meeting) from Michel Dumontier on 2013-06-04 (public-semweb-lifesci@w3.org from June 2013)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Tue, 4 Jun 2013 14:51:01 +0200
To: Alasdair J G Gray <Alasdair.Gray@manchester.ac.uk>
Cc: Jerven Bolleman <me@jerven.eu>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <CALcEXf4hHPQbhEoDzz6GYuvjPGhUVcV9i5oiWmGVk1Gu+pxcZA@mail.gmail.com>

Alasdair,
  We have large lists of preferred prefixes for datasets, some of them are
defined by communities (uniprot, ncbi, psi-mi, etc), and others are used in
the scope of specific datasets to refer to other datasets. As part of
Bio2RDF, and in collaboration with identifiers.org, we collect these
prefixes and assign a globally unique prefix, which then is the basis for
generating our URIs. Should data providers not give a preferred prefix, we
will assign one on their behalf. Having identifier and URI regexes can be
used to validate identifiers and URIs, so that we can validate RDF
documents and make sure that people haven't created unlinkable URIs.  If
the dataset description task force believes that this is an relatively
undesirable service, then keeping it at "may" is appropriate. If, however,
we, as a community interested in building a high quality web of data
requires correct linking, we may want to more strongly recommend it.

m.

On Tue, Jun 4, 2013 at 11:27 AM, Alasdair J G Gray <
Alasdair.Gray@manchester.ac.uk> wrote:

>
> On 3 Jun 2013, at 17:51, Michel Dumontier <michel.dumontier@gmail.com>
> wrote:
>
>
>
>> Also continuing the discussion on e-mail that started on the call.
>> We should have a clear definition of data item if we are going to record
>> information about such things. e.g. baseURI, what happens if we have 2 data
>> item types in a single dataset?
>>
>>
> ultimately, what i want is to :
> i) to validate the syntax of identifier in some dataset or cross reference
> (legacy, RDF)
>
>
> This is a good goal.
>
> ii) to compose a URI from a preferred or alternative prefix and an
> identifier   (legacy to RDF)
>
>
> This is problematic. Prefixes do not have any global scope. A prefix in an
> XML or turtle file MUST be defined within that file; it is a locally scoped
> variable; ultimately it is syntactic sugar. While I appreciate that
> something like PMID:22434840 has meaning to folk in the life sciences
> field, it is problematic if it is written out of context and ultimately it
> is not machine understandable. The only way such an identifier is useful is
> if it has a definition in the document with it for the value of the locally
> scoped PMID, and then in that case I could equally have used foo:22434840.
>
> iii) to decompose a URI to a preferred prefix and identifier pair (RDF to
> legacy)
>
>
> Again, there is a scoping problem. Prefixes are locally scoped and must be
> defined.
>
>  iv) to translate one URI pattern to another URI pattern (RDF)
>
>
> This is an important problem and is where services such as BridgeDB and
> Identifiers.org come into the mix. There are some nuances that must be
> captured even if we are only focusing on RDF URIs (and not the URIs of web
> pages associate with a resource which opens up another can of worms). For
> example, some representations of ChEBI use identifiers of the form
> foo:CHEBI:73726 while others use bar:CHEBI_73726.
>
> What we need to remember here is that these are optional (MAY) properties.
>
> Alasdair
>
> Dr Alasdair J G Gray
> Research Associate
> Alasdair.Gray@manchester.ac.uk
> +44 161 275 0145
>
> http://www.cs.man.ac.uk/~graya/
>
> Please consider the environment before printing this email.
>
>

-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Tuesday, 4 June 2013 12:51:53 UTC