Re: My task from last week: Semantic free identifiers from Sivaram Arabandi, MD on 2011-06-20 (public-semweb-lifesci@w3.org from June 2011)

From: Sivaram Arabandi, MD <sivaram.arabandi@gmail.com>
Date: Mon, 20 Jun 2011 19:54:04 -0400
To: Helena Deus <helenadeus@gmail.com>
Cc: Michel_Dumontier <Michel_Dumontier@carleton.ca>, "M. Scott Marshall" <mscottmarshall@gmail.com>, Chime Ogbuji <chimezie@gmail.com>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, "MMVagnoni@mdanderson.org" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-Id: <29E2472B-3C42-4B38-8D6F-E8A6678F84FC@gmail.com>
Am I correct in assuming that we are talking about Ontology models that are used to describe a domain and not the instance data? For sure, at the instance data level, using opaque identifiers, preferably generated, would be the way to go. 

Excellent point about databases, but I am not sure the analogy applies here. The identifiers are indeed internal to the technology and information systems. Databases enforce uniqueness of the primary keys - however, this doesn't prevent duplicate 'labels' from being introduced into the system, each with their own primary key. The unique identifiers don't mean a thing then - some poor human has to manually do the clean up. 

--Sivaram


On Jun 20, 2011, at 7:12 PM, Helena Deus wrote:

> 
> 
> On Mon, Jun 20, 2011 at 11:44 PM, Michel_Dumontier <Michel_Dumontier@carleton.ca> wrote:
> IMHO, if you're still coding the content of an information system by hand, then you're going to introduce errors. A database curator should never assign their own identifier - this is internal to the technology and the information system. If you're a programmer, you should query the resource (ontology) for the identifiers based on the labels.  Be more sophisticated. Do it right. Build useable APIs/UIs for people.
> 
> +1! 
> 
> Best,
> 
> m.
> 
> 
> 
> > -----Original Message-----
> > From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-lifesci-
> > request@w3.org] On Behalf Of Sivaram Arabandi, MD
> > Sent: Monday, June 20, 2011 6:14 PM
> > To: M. Scott Marshall
> > Cc: Chime Ogbuji; Andrea Splendiani; MMVagnoni@mdanderson.org; James
> > Malone; HCLS; Jonathan Rees
> > Subject: Re: My task from last week: Semantic free identifiers
> >
> > Consider the following:
> >
> > 1. Readability - the former is far more readable than the later:
> >        RO:part_of
> >               vs.
> >       <http://purl.obolibrary.org/obo/RO_0000001>
> >
> >     - this becomes even more apparent in a triple (CO = a 'Cardiology
> > Ontology'):
> >       CO:Mitral_valve   RO:part_of   CO:Heart
> >               vs.
> >       CO_01234556   RO_0000001   CO_01234554
> >               - doesn't make much sense (without tool support, which is
> > 'practically' non-existent).
> >
> > 2.  Mistakes are extremely difficult to spot with opaque identifiers:
> >       CO_01234556   RO_0000001   CO_01224554
> >               vs.
> >       CO:Mitral_valve   RO:part_of   CO:Brain
> >               - this is an obviously false statement - but not easy to spot
> > if opaque identifiers were used.
> >
> >       This leads to a very insidious problem, one that is difficult to
> > detect.
> >
> > 3. I am not sure why the following is an issue:
> >       " Is my http://experiment the same as yours?
> >         Is my http://gene? http://study?
> >         Does my gene http://leads_to disease make sense?"
> >
> >       - Obviously if I use "http://experiment" and you use
> > "http://experiment" we both are referring to the same thing.
> >       - But instead if I use "http://medicine/experiment"  and you use
> > "http://biology/experiment", we 'may' not be referring to the same thing.
> >
> > 4. When using readable identifiers, it is difficult to make changes to an
> > existing term (Class) - I think this is a strength as opposed to an issue.
> > It raises the bar and should encourage authors (of models) to create terms
> > thoughtfully after due diligence. And when there is a real need to change
> > the term i.e. its meaning has changed or it was inappropriate, ontology
> > patterns can be used to retire the term (if necessary, labelled as
> > deprecated) or reposition it.
> >       - 'Typos' in term names is definitely not a reason for having opaque
> > identifiers. Avoid them by having a good process for introducing terms. If
> > and when they occur, use ontology patterns to deal with them.
> >       - Using opaque identifiers with labels makes it very easy, almost too
> > easy, for the labels to be changed. Often times users of a model may not be
> > aware of such changes.
> >
> >
> > --Sivaram
> >
> >
> >
> > On Jun 20, 2011, at 4:15 PM, M. Scott Marshall wrote:
> >
> > > Hi Chime,
> > >
> > > The main reason is that when semantics and natural language are
> > > inserted into identifiers, some identifers are doomed to become stale
> > > as thinking evolves or changes about the semantic representation. Or
> > > when a new 'name brand' is created for that namespace: I think that
> > > the best example of this was provided by Jonathan Rees for Shared
> > > Names - ever heard of 'locuslink' identifiers? I believe that Entrez
> > > Gene occupies the name branding of that space now.This is precisely
> > > the sort of problem that Shared Names would like to avoid by serving
> > > (non-ontological) identifiers from a 'neutral namespace'. In
> > > ontologies, the same principle applies (I see that Helena has supplied
> > > a good example).
> > >
> > > I agree with Mark about proper tooling - the tools should
> > > automatically display labels. It's true that I don't know of a SPARQL
> > > editor that does this to a satisfying degree yet, (except for one:
> > > SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
> > > shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
> > > reason to create identifiers and your knowledge representation in a
> > > way that won't stand the test of time.
> > >
> > > Shouldn't we consider RDF to be the bytecode of knowledge? Although I
> > > understand the difficulty of dealing with non-human readable
> > > identifiers in SPARQL and RDF, I believe that we are now looking at
> > > bytecode and complaining that it isn't human readable. It's true that,
> > > until the tools are available, it is difficult to write SPARQL
> > > queries. But if we applied the same logic to gene accession numbers,
> > > where would we be now? The SPARQL queries will eventually be 'under
> > > the hood', supplying labels to a GUI near you. :)
> > >
> > > Cheers,
> > > Scott
> > >
> > > On Mon, Jun 20, 2011 at 9:34 PM, Chime Ogbuji <chimezie@gmail.com> wrote:
> > >> On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:
> > >>
> > >> Hi,
> > >> sorry to jump on this thread like this...
> > >>
> > >> To be honest, I'm kind of concerned by the insistence on semantic-opaque
> > >> identifiers.
> > >>
> > >> I am as well and I have been for some time.
> > >>
> > >> I understand the reason for them,
> > >>
> > >> Actually, I would be interested in hearing the reason for them
> > enumerated,
> > >> because I have had a hard time imagining what could possibly offset the
> > >> (significant) impact on readability that it has on biomedical
> > ontologies.
> > >>  The barrier is already high for non-logicians and non-semantic web
> > >> aficionados to use biomedical ontologies.  Why set it any higher?
> > >> -- Chime
> > >>
> > >
> > >
> > >
> > > --
> > > M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
> > > http://staff.science.uva.nl/~marshall
> > >
> >
> >
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
> 
> 
> 
> 
> -- 
> Helena F. Deus
> Post-Doctoral Researcher at DERI/NUIG
> http://lenadeus.info/
>
Received on Monday, 20 June 2011 23:54:36 UTC