Re: My task from last week: Semantic free identifiers from Sivaram Arabandi, MD on 2011-06-20 (public-semweb-lifesci@w3.org from June 2011)

From: Sivaram Arabandi, MD <sivaram.arabandi@gmail.com>
Date: Mon, 20 Jun 2011 18:14:04 -0400
To: "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: Chime Ogbuji <chimezie@gmail.com>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, MMVagnoni@mdanderson.org, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-Id: <9729B97F-AD69-41B9-9124-909A879F02DE@gmail.com>

Consider the following:

1. Readability - the former is far more readable than the later:
	 RO:part_of      
		vs.   
	<http://purl.obolibrary.org/obo/RO_0000001>

    - this becomes even more apparent in a triple (CO = a 'Cardiology Ontology'):
	CO:Mitral_valve   RO:part_of   CO:Heart  
		vs.
 	CO_01234556   RO_0000001   CO_01234554
		- doesn't make much sense (without tool support, which is 'practically' non-existent). 

2.  Mistakes are extremely difficult to spot with opaque identifiers:
	CO_01234556   RO_0000001   CO_01224554
		vs.
	CO:Mitral_valve   RO:part_of   CO:Brain 
		- this is an obviously false statement - but not easy to spot if opaque identifiers were used. 
	
	This leads to a very insidious problem, one that is difficult to detect.

3. I am not sure why the following is an issue:
	" Is my http://experiment the same as yours?  
	  Is my http://gene? http://study? 
	  Does my gene http://leads_to disease make sense?"

	- Obviously if I use "http://experiment" and you use "http://experiment" we both are referring to the same thing.
	- But instead if I use "http://medicine/experiment"  and you use "http://biology/experiment", we 'may' not be referring to the same thing.

4. When using readable identifiers, it is difficult to make changes to an existing term (Class) - I think this is a strength as opposed to an issue. It raises the bar and should encourage authors (of models) to create terms thoughtfully after due diligence. And when there is a real need to change the term i.e. its meaning has changed or it was inappropriate, ontology patterns can be used to retire the term (if necessary, labelled as deprecated) or reposition it. 
	- 'Typos' in term names is definitely not a reason for having opaque identifiers. Avoid them by having a good process for introducing terms. If and when they occur, use ontology patterns to deal with them. 
	- Using opaque identifiers with labels makes it very easy, almost too easy, for the labels to be changed. Often times users of a model may not be aware of such changes. 

	
--Sivaram



On Jun 20, 2011, at 4:15 PM, M. Scott Marshall wrote:

> Hi Chime,
> 
> The main reason is that when semantics and natural language are
> inserted into identifiers, some identifers are doomed to become stale
> as thinking evolves or changes about the semantic representation. Or
> when a new 'name brand' is created for that namespace: I think that
> the best example of this was provided by Jonathan Rees for Shared
> Names - ever heard of 'locuslink' identifiers? I believe that Entrez
> Gene occupies the name branding of that space now.This is precisely
> the sort of problem that Shared Names would like to avoid by serving
> (non-ontological) identifiers from a 'neutral namespace'. In
> ontologies, the same principle applies (I see that Helena has supplied
> a good example).
> 
> I agree with Mark about proper tooling - the tools should
> automatically display labels. It's true that I don't know of a SPARQL
> editor that does this to a satisfying degree yet, (except for one:
> SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
> shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
> reason to create identifiers and your knowledge representation in a
> way that won't stand the test of time.
> 
> Shouldn't we consider RDF to be the bytecode of knowledge? Although I
> understand the difficulty of dealing with non-human readable
> identifiers in SPARQL and RDF, I believe that we are now looking at
> bytecode and complaining that it isn't human readable. It's true that,
> until the tools are available, it is difficult to write SPARQL
> queries. But if we applied the same logic to gene accession numbers,
> where would we be now? The SPARQL queries will eventually be 'under
> the hood', supplying labels to a GUI near you. :)
> 
> Cheers,
> Scott
> 
> On Mon, Jun 20, 2011 at 9:34 PM, Chime Ogbuji <chimezie@gmail.com> wrote:
>> On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:
>> 
>> Hi,
>> sorry to jump on this thread like this...
>> 
>> To be honest, I'm kind of concerned by the insistence on semantic-opaque
>> identifiers.
>> 
>> I am as well and I have been for some time.
>> 
>> I understand the reason for them,
>> 
>> Actually, I would be interested in hearing the reason for them enumerated,
>> because I have had a hard time imagining what could possibly offset the
>> (significant) impact on readability that it has on biomedical ontologies.
>>  The barrier is already high for non-logicians and non-semantic web
>> aficionados to use biomedical ontologies.  Why set it any higher?
>> -- Chime
>> 
> 
> 
> 
> -- 
> M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
> http://staff.science.uva.nl/~marshall
>

Received on Monday, 20 June 2011 22:14:45 UTC