RE: My task from last week: Semantic free identifiers from Michel_Dumontier on 2011-06-20 (public-semweb-lifesci@w3.org from June 2011)

From: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Date: Mon, 20 Jun 2011 18:44:49 -0400
To: "Sivaram Arabandi, MD" <sivaram.arabandi@gmail.com>, "M. Scott Marshall" <mscottmarshall@gmail.com>
CC: Chime Ogbuji <chimezie@gmail.com>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, "MMVagnoni@mdanderson.org" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-ID: <E1784B0107E5634C8997868083EDE78061257059A0@CCSMBX10.CUNET.CARLETON.CA>

IMHO, if you're still coding the content of an information system by hand, then you're going to introduce errors. A database curator should never assign their own identifier - this is internal to the technology and the information system. If you're a programmer, you should query the resource (ontology) for the identifiers based on the labels.  Be more sophisticated. Do it right. Build useable APIs/UIs for people.

Best,

m.



> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-
> request@w3.org] On Behalf Of Sivaram Arabandi, MD
> Sent: Monday, June 20, 2011 6:14 PM
> To: M. Scott Marshall
> Cc: Chime Ogbuji; Andrea Splendiani; MMVagnoni@mdanderson.org; James
> Malone; HCLS; Jonathan Rees
> Subject: Re: My task from last week: Semantic free identifiers
> 
> Consider the following:
> 
> 1. Readability - the former is far more readable than the later:
> 	 RO:part_of
> 		vs.
> 	<http://purl.obolibrary.org/obo/RO_0000001>
> 
>     - this becomes even more apparent in a triple (CO = a 'Cardiology
> Ontology'):
> 	CO:Mitral_valve   RO:part_of   CO:Heart
> 		vs.
>  	CO_01234556   RO_0000001   CO_01234554
> 		- doesn't make much sense (without tool support, which is
> 'practically' non-existent).
> 
> 2.  Mistakes are extremely difficult to spot with opaque identifiers:
> 	CO_01234556   RO_0000001   CO_01224554
> 		vs.
> 	CO:Mitral_valve   RO:part_of   CO:Brain
> 		- this is an obviously false statement - but not easy to spot
> if opaque identifiers were used.
> 
> 	This leads to a very insidious problem, one that is difficult to
> detect.
> 
> 3. I am not sure why the following is an issue:
> 	" Is my http://experiment the same as yours?
> 	  Is my http://gene? http://study?
> 	  Does my gene http://leads_to disease make sense?"
> 
> 	- Obviously if I use "http://experiment" and you use
> "http://experiment" we both are referring to the same thing.
> 	- But instead if I use "http://medicine/experiment"  and you use
> "http://biology/experiment", we 'may' not be referring to the same thing.
> 
> 4. When using readable identifiers, it is difficult to make changes to an
> existing term (Class) - I think this is a strength as opposed to an issue.
> It raises the bar and should encourage authors (of models) to create terms
> thoughtfully after due diligence. And when there is a real need to change
> the term i.e. its meaning has changed or it was inappropriate, ontology
> patterns can be used to retire the term (if necessary, labelled as
> deprecated) or reposition it.
> 	- 'Typos' in term names is definitely not a reason for having opaque
> identifiers. Avoid them by having a good process for introducing terms. If
> and when they occur, use ontology patterns to deal with them.
> 	- Using opaque identifiers with labels makes it very easy, almost too
> easy, for the labels to be changed. Often times users of a model may not be
> aware of such changes.
> 
> 
> --Sivaram
> 
> 
> 
> On Jun 20, 2011, at 4:15 PM, M. Scott Marshall wrote:
> 
> > Hi Chime,
> >
> > The main reason is that when semantics and natural language are
> > inserted into identifiers, some identifers are doomed to become stale
> > as thinking evolves or changes about the semantic representation. Or
> > when a new 'name brand' is created for that namespace: I think that
> > the best example of this was provided by Jonathan Rees for Shared
> > Names - ever heard of 'locuslink' identifiers? I believe that Entrez
> > Gene occupies the name branding of that space now.This is precisely
> > the sort of problem that Shared Names would like to avoid by serving
> > (non-ontological) identifiers from a 'neutral namespace'. In
> > ontologies, the same principle applies (I see that Helena has supplied
> > a good example).
> >
> > I agree with Mark about proper tooling - the tools should
> > automatically display labels. It's true that I don't know of a SPARQL
> > editor that does this to a satisfying degree yet, (except for one:
> > SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
> > shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
> > reason to create identifiers and your knowledge representation in a
> > way that won't stand the test of time.
> >
> > Shouldn't we consider RDF to be the bytecode of knowledge? Although I
> > understand the difficulty of dealing with non-human readable
> > identifiers in SPARQL and RDF, I believe that we are now looking at
> > bytecode and complaining that it isn't human readable. It's true that,
> > until the tools are available, it is difficult to write SPARQL
> > queries. But if we applied the same logic to gene accession numbers,
> > where would we be now? The SPARQL queries will eventually be 'under
> > the hood', supplying labels to a GUI near you. :)
> >
> > Cheers,
> > Scott
> >
> > On Mon, Jun 20, 2011 at 9:34 PM, Chime Ogbuji <chimezie@gmail.com> wrote:
> >> On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:
> >>
> >> Hi,
> >> sorry to jump on this thread like this...
> >>
> >> To be honest, I'm kind of concerned by the insistence on semantic-opaque
> >> identifiers.
> >>
> >> I am as well and I have been for some time.
> >>
> >> I understand the reason for them,
> >>
> >> Actually, I would be interested in hearing the reason for them
> enumerated,
> >> because I have had a hard time imagining what could possibly offset the
> >> (significant) impact on readability that it has on biomedical
> ontologies.
> >>  The barrier is already high for non-logicians and non-semantic web
> >> aficionados to use biomedical ontologies.  Why set it any higher?
> >> -- Chime
> >>
> >
> >
> >
> > --
> > M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
> > http://staff.science.uva.nl/~marshall
> >
> 
> 
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
> 
> -----
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11

Received on Monday, 20 June 2011 22:44:38 UTC