Re: Identifiers (was Notes from today's meeting) from Alasdair J G Gray on 2013-06-04 (public-semweb-lifesci@w3.org from June 2013)

From: Alasdair J G Gray <Alasdair.Gray@manchester.ac.uk>
Date: Tue, 4 Jun 2013 14:39:09 +0100
To: Jerven Bolleman <me@jerven.eu>
Cc: Michel Dumontier <michel.dumontier@gmail.com>, Joachim Baran <joachim.baran@gmail.com>, N Juty <juty@ebi.ac.uk>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-Id: <E5A00B76-9F02-4ACC-8080-8D5D9D97FD76@manchester.ac.uk>

On 4 Jun 2013, at 14:20, Jerven Bolleman <me@jerven.eu> wrote:

> 
> 
> 
> On Tue, Jun 4, 2013 at 3:08 PM, Michel Dumontier <michel.dumontier@gmail.com> wrote:
> The point here is simple. if you provide a URI uniprot:1.2.3.4, i would like to know that this is incorrect.
> 
> m.
> Yes, but the model needs to be good enough to tell you that. The model discussed yesterday with
> data item identifer regex pattern is not strong enough to do so. The void uriRegexPattern might be good enough.
> 
> :x a void:Dataset ;
>    void:uriRegexPattern "ec:[1-6].\d.\d.\d" , "uniprot:P\d{5}" .
> 
> But I am thinking that we can have stronger validation patterns if we think a bit more.
> e.g. can we think of something that can prevent.
> 
> uniprot:P12345 a up:Sequence .
> sequence:P12345 a up:Protein .
> 
Of course, the prefixes here are syntactic shortcuts for the full URI, so you would be able to distinguish these if you encode the complete URI and not just the identifier part.
(Not meaning to sound like a broken record ;) )

> And is a dataset description the right place for this validation data?
> 
So for the Bio2RDF/Identifiers.org use case yes. However, it may be most appropriate for a service such as Identifiers.org to extend dataset descriptions provided by publishers with this sort of information.

Alasdair

> Regards,
> Jerven
> 
> 
> On Tue, Jun 4, 2013 at 3:01 PM, Joachim Baran <joachim.baran@gmail.com> wrote:
> 
> On 4 June 2013 08:56, Jerven Bolleman <me@jerven.eu> wrote:
> uniprot:P12345 a up:Protein ;
>                        up:enzyme ec:1.2.3.4 .
> ec:1.2.3.4 a up:Enzyme .
> What if my data is  
> 
> uniprot:1.2.3.4 a up:Protein ;
>                        up:enzyme ec:P12345 .
> ec:P12345 a up:Enzyme .
>   I do not understand the new example. You just switched the identifiers?
>  
> What if I don't have a regular expression for one of the sets?
>   I suggest it implies the set of all URIs, i.e. the regexp: .*
>  
> Or two very similar ones?
> e.g. mgi and pubmed?
>   Take the union regexp.
> 
> Joachim
> 
> 
> 
> 
> -- 
> Michel Dumontier
> Associate Professor of Bioinformatics, Carleton University
> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
> http://dumontierlab.com
> 
> 
> 
> -- 
> Jerven Bolleman
> me@jerven.eu

Dr Alasdair J G Gray
Research Associate
Alasdair.Gray@manchester.ac.uk
+44 161 275 0145

http://www.cs.man.ac.uk/~graya/

Please consider the environment before printing this email.

Received on Tuesday, 4 June 2013 13:39:38 UTC