Re: Identifiers (was Notes from today's meeting) from Michel Dumontier on 2013-06-04 (public-semweb-lifesci@w3.org from June 2013)

From: Michel Dumontier <michel.dumontier@gmail.com>
Date: Tue, 4 Jun 2013 15:40:16 +0200
To: Alasdair J G Gray <Alasdair.Gray@manchester.ac.uk>
Cc: Jerven Bolleman <me@jerven.eu>, Joachim Baran <joachim.baran@gmail.com>, N Juty <juty@ebi.ac.uk>, "public-semweb-lifesci@w3.org" <public-semweb-lifesci@w3.org>
Message-ID: <CALcEXf6u7xSFsa=stCA7+7Wh3K8B=yEJh1Gz+ix7Gxo6P2hf+w@mail.gmail.com>

On Tue, Jun 4, 2013 at 3:39 PM, Alasdair J G Gray <
Alasdair.Gray@manchester.ac.uk> wrote:

>
> On 4 Jun 2013, at 14:20, Jerven Bolleman <me@jerven.eu> wrote:
>
>
>
>
> On Tue, Jun 4, 2013 at 3:08 PM, Michel Dumontier <
> michel.dumontier@gmail.com> wrote:
>
>> The point here is simple. if you provide a URI uniprot:1.2.3.4, i would
>> like to know that this is incorrect.
>>
>> m.
>>
> Yes, but the model needs to be good enough to tell you that. The model
> discussed yesterday with
> data item identifer regex pattern is not strong enough to do so. The void
> uriRegexPattern might be good enough.
>
> :x a void:Dataset ;
>    void:uriRegexPattern "ec:[1-6].\d.\d.\d" , "uniprot:P\d{5}" .
>
> But I am thinking that we can have stronger validation patterns if we
> think a bit more.
> e.g. can we think of something that can prevent.
>
> uniprot:P12345 a up:Sequence .
> sequence:P12345 a up:Protein .
>
> Of course, the prefixes here are syntactic shortcuts for the full URI, so
> you would be able to distinguish these if you encode the complete URI and
> not just the identifier part.
> (Not meaning to sound like a broken record ;) )
>
> And is a dataset description the right place for this validation data?
>
> So for the Bio2RDF/Identifiers.org use case yes. However, it may be most
> appropriate for a service such as Identifiers.org to extend dataset
> descriptions provided by publishers with this sort of information.
>
>
if original data publishers provided this information, we wouldn't have to
curate it. (sorry nick!)

m.


> Alasdair
>
> Regards,
> Jerven
>
>>
>>
>> On Tue, Jun 4, 2013 at 3:01 PM, Joachim Baran <joachim.baran@gmail.com>wrote:
>>
>>>
>>> On 4 June 2013 08:56, Jerven Bolleman <me@jerven.eu> wrote:
>>>
>>>> uniprot:P12345 a up:Protein ;
>>>>
>>>>>                         up:enzyme ec:1.2.3.4 .
>>>>>> ec:1.2.3.4 a up:Enzyme .
>>>>>> What if my data is
>>>>>>
>>>>>
>>>> uniprot:1.2.3.4 a up:Protein ;
>>>>                        up:enzyme ec:P12345 .
>>>> ec:P12345 a up:Enzyme .
>>>>
>>>   I do not understand the new example. You just switched the identifiers?
>>>
>>>
>>>> What if I don't have a regular expression for one of the sets?
>>>>
>>>   I suggest it implies the set of all URIs, i.e. the regexp: .*
>>>
>>>
>>>> Or two very similar ones?
>>>>  e.g. mgi and pubmed?
>>>>
>>>   Take the union regexp.
>>>
>>> Joachim
>>>
>>>
>>
>>
>> --
>> Michel Dumontier
>> Associate Professor of Bioinformatics, Carleton University
>> Chair, W3C Semantic Web for Health Care and the Life Sciences Interest
>> Group
>> http://dumontierlab.com
>>
>
>
>
> --
> Jerven Bolleman
> me@jerven.eu
>
>
> Dr Alasdair J G Gray
> Research Associate
> Alasdair.Gray@manchester.ac.uk
> +44 161 275 0145
>
> http://www.cs.man.ac.uk/~graya/
>
> Please consider the environment before printing this email.
>
>


-- 
Michel Dumontier
Associate Professor of Bioinformatics, Carleton University
Chair, W3C Semantic Web for Health Care and the Life Sciences Interest Group
http://dumontierlab.com

Received on Tuesday, 4 June 2013 13:41:08 UTC