Re: Identifiers (was Notes from today's meeting)

On Tue, Jun 4, 2013 at 3:20 PM, Jerven Bolleman <me@jerven.eu> wrote:

>
>
>
> On Tue, Jun 4, 2013 at 3:08 PM, Michel Dumontier <
> michel.dumontier@gmail.com> wrote:
>
>> The point here is simple. if you provide a URI uniprot:1.2.3.4, i would
>> like to know that this is incorrect.
>>
>> m.
>>
> Yes, but the model needs to be good enough to tell you that. The model
> discussed yesterday with
> data item identifer regex pattern is not strong enough to do so. The void
> uriRegexPattern might be good enough.
>
> :x a void:Dataset ;
>    void:uriRegexPattern "ec:[1-6].\d.\d.\d" , "uniprot:P\d{5}" .
>
>
in our registry, we have 4 prefixes for "ec"
ec, enzyme nomenclature, ec-code, enzyme classification

where "ec" is the (global) preferred prefix, and the others are cultivated
from various datasets

so, in a regex, (ec|enzyme nomenclature|ec\-code|enzyme classification)

and the identifier part matches to:
"\d+\.-\.-\.-|\d+\.\d+\.-\.-|\d+\.\d+\.\d+\.-|\d+\.\d+\.\d+\.(n)?\d+"

so, putting the prefix in use and provided identifier together, we would
ask whether it matches to
"(ec|enzyme nomenclature|ec\-code|enzyme classification)\:(
\d+\.-\.-\.-|\d+\.\d+\.-\.-|\d+\.\d+\.\d+\.-|\d+\.\d+\.\d+\.(n)?\d+)"

we would also want to match fully qualified URIs in a similar manner.



> But I am thinking that we can have stronger validation patterns if we
> think a bit more.
> e.g. can we think of something that can prevent.
>
> uniprot:P12345 a up:Sequence .
> sequence:P12345 a up:Protein .
>
> And is a dataset description the right place for this validation data?
>
> yes.

m.


-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Tuesday, 4 June 2013 13:40:07 UTC