Re: My task from last week: Semantic free identifiers from Matt Vagnoni on 2011-06-21 (public-semweb-lifesci@w3.org from June 2011)

From: Matt Vagnoni <matthew.vagnoni@uth.tmc.edu>
Date: Tue, 21 Jun 2011 12:28:44 -0500
To: Michel_Dumontier <Michel_Dumontier@carleton.ca>
Cc: "Sivaram Arabandi, MD" <sivaram.arabandi@gmail.com>, "M. Scott Marshall" <mscottmarshall@gmail.com>, Chime Ogbuji <chimezie@gmail.com>, Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, "MMVagnoni@mdanderson.org" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-ID: <BANLkTinjFO=ztfMP9umD0mjW0q-XYZ_6ZA@mail.gmail.com>
Here's my thinking: The whole point of the semantic web is to get away from
relying on terms.  Why would you intentionally want to become dependent upon
labels (terms)?

Label's are not identifiers; they are annotations.  There is no uniqueness
guarantee.  A concept can have many labels and many concepts can have the
same label.  Labels should not be relied upon for a definition of a concept.
 That is what a formal definition is for.  If we are relying upon a
formal definition, humans need an easy to remember (mnemonic) means of
uniquely and unambiguously referring to that formally defined concept.  If
two concepts have the same label, they are not the same.  If two concepts
have the same URI they are.

I would much rather one day to find out that a URI I point to is now
deprecated, than to find out that a URI I *thought* meant one thing now
means another because the label changed.  I want URI depreciation of a
concept is no longer relevant but new concepts are.  I do not want the
meaning of a concept that I relied upon to change without notice.  I would
rather have a URI, which might have once meant one thing but is now
depreciated and refers to something else.  Establishing a transitive
property makes it easy to display what are the new, non-depreciated concepts
and perhaps the preferred concept (default) from the deprecated one.

As for offending people by choice of URI.  I am offended by skos:narrower,
because the first time I used it I thought it meant "is narrower" but it
means "has narrower".  Skos:narrower has an english label that does clarify
that, but they use the URI as well.

On Mon, Jun 20, 2011 at 5:44 PM, Michel_Dumontier <
Michel_Dumontier@carleton.ca> wrote:

> IMHO, if you're still coding the content of an information system by hand,
> then you're going to introduce errors. A database curator should never
> assign their own identifier - this is internal to the technology and the
> information system. If you're a programmer, you should query the resource
> (ontology) for the identifiers based on the labels.  Be more sophisticated.
> Do it right. Build useable APIs/UIs for people.
>
> Best,
>
> m.
>
>
>
> > -----Original Message-----
> > From: public-semweb-lifesci-request@w3.org [mailto:
> public-semweb-lifesci-
> > request@w3.org] On Behalf Of Sivaram Arabandi, MD
> > Sent: Monday, June 20, 2011 6:14 PM
> > To: M. Scott Marshall
> > Cc: Chime Ogbuji; Andrea Splendiani; MMVagnoni@mdanderson.org; James
> > Malone; HCLS; Jonathan Rees
> > Subject: Re: My task from last week: Semantic free identifiers
> >
> > Consider the following:
> >
> > 1. Readability - the former is far more readable than the later:
> >        RO:part_of
> >               vs.
> >       <http://purl.obolibrary.org/obo/RO_0000001>
> >
> >     - this becomes even more apparent in a triple (CO = a 'Cardiology
> > Ontology'):
> >       CO:Mitral_valve   RO:part_of   CO:Heart
> >               vs.
> >       CO_01234556   RO_0000001   CO_01234554
> >               - doesn't make much sense (without tool support, which is
> > 'practically' non-existent).
> >
> > 2.  Mistakes are extremely difficult to spot with opaque identifiers:
> >       CO_01234556   RO_0000001   CO_01224554
> >               vs.
> >       CO:Mitral_valve   RO:part_of   CO:Brain
> >               - this is an obviously false statement - but not easy to
> spot
> > if opaque identifiers were used.
> >
> >       This leads to a very insidious problem, one that is difficult to
> > detect.
> >
> > 3. I am not sure why the following is an issue:
> >       " Is my http://experiment the same as yours?
> >         Is my http://gene? http://study?
> >         Does my gene http://leads_to disease make sense?"
> >
> >       - Obviously if I use "http://experiment" and you use
> > "http://experiment" we both are referring to the same thing.
> >       - But instead if I use "http://medicine/experiment"  and you use
> > "http://biology/experiment", we 'may' not be referring to the same
> thing.
> >
> > 4. When using readable identifiers, it is difficult to make changes to an
> > existing term (Class) - I think this is a strength as opposed to an
> issue.
> > It raises the bar and should encourage authors (of models) to create
> terms
> > thoughtfully after due diligence. And when there is a real need to change
> > the term i.e. its meaning has changed or it was inappropriate, ontology
> > patterns can be used to retire the term (if necessary, labelled as
> > deprecated) or reposition it.
> >       - 'Typos' in term names is definitely not a reason for having
> opaque
> > identifiers. Avoid them by having a good process for introducing terms.
> If
> > and when they occur, use ontology patterns to deal with them.
> >       - Using opaque identifiers with labels makes it very easy, almost
> too
> > easy, for the labels to be changed. Often times users of a model may not
> be
> > aware of such changes.
> >
> >
> > --Sivaram
> >
> >
> >
> > On Jun 20, 2011, at 4:15 PM, M. Scott Marshall wrote:
> >
> > > Hi Chime,
> > >
> > > The main reason is that when semantics and natural language are
> > > inserted into identifiers, some identifers are doomed to become stale
> > > as thinking evolves or changes about the semantic representation. Or
> > > when a new 'name brand' is created for that namespace: I think that
> > > the best example of this was provided by Jonathan Rees for Shared
> > > Names - ever heard of 'locuslink' identifiers? I believe that Entrez
> > > Gene occupies the name branding of that space now.This is precisely
> > > the sort of problem that Shared Names would like to avoid by serving
> > > (non-ontological) identifiers from a 'neutral namespace'. In
> > > ontologies, the same principle applies (I see that Helena has supplied
> > > a good example).
> > >
> > > I agree with Mark about proper tooling - the tools should
> > > automatically display labels. It's true that I don't know of a SPARQL
> > > editor that does this to a satisfying degree yet, (except for one:
> > > SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
> > > shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
> > > reason to create identifiers and your knowledge representation in a
> > > way that won't stand the test of time.
> > >
> > > Shouldn't we consider RDF to be the bytecode of knowledge? Although I
> > > understand the difficulty of dealing with non-human readable
> > > identifiers in SPARQL and RDF, I believe that we are now looking at
> > > bytecode and complaining that it isn't human readable. It's true that,
> > > until the tools are available, it is difficult to write SPARQL
> > > queries. But if we applied the same logic to gene accession numbers,
> > > where would we be now? The SPARQL queries will eventually be 'under
> > > the hood', supplying labels to a GUI near you. :)
> > >
> > > Cheers,
> > > Scott
> > >
> > > On Mon, Jun 20, 2011 at 9:34 PM, Chime Ogbuji <chimezie@gmail.com>
> wrote:
> > >> On Monday, June 20, 2011 at 3:08 PM, Andrea Splendiani wrote:
> > >>
> > >> Hi,
> > >> sorry to jump on this thread like this...
> > >>
> > >> To be honest, I'm kind of concerned by the insistence on
> semantic-opaque
> > >> identifiers.
> > >>
> > >> I am as well and I have been for some time.
> > >>
> > >> I understand the reason for them,
> > >>
> > >> Actually, I would be interested in hearing the reason for them
> > enumerated,
> > >> because I have had a hard time imagining what could possibly offset
> the
> > >> (significant) impact on readability that it has on biomedical
> > ontologies.
> > >>  The barrier is already high for non-logicians and non-semantic web
> > >> aficionados to use biomedical ontologies.  Why set it any higher?
> > >> -- Chime
> > >>
> > >
> > >
> > >
> > > --
> > > M. Scott Marshall, W3C HCLS IG co-chair, http://www.w3.org/blog/hcls
> > > http://staff.science.uva.nl/~marshall
> > >
> >
> >
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
> >
> > -----
> > No virus found in this message.
> > Checked by AVG - www.avg.com
> > Version: 10.0.1382 / Virus Database: 1513/3716 - Release Date: 06/20/11
>
>


-- 
Best,
Matthew Vagnoni, MS
Senior Scientific Programmer and PhD Student
The Center for Biosecurity and Public Health Informatics Research
The School of Biomedical Informatics
The University of Texas Health Science Center at Houston
Tel: (713) 500-3952
Received on Tuesday, 21 June 2011 17:29:23 UTC