W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > June 2011

Re: My task from last week: Semantic free identifiers

From: Chime Ogbuji <chimezie@gmail.com>
Date: Mon, 20 Jun 2011 17:40:14 -0400
To: "M. Scott Marshall" <mscottmarshall@gmail.com>
Cc: Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>, MMVagnoni@mdanderson.org, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-ID: <B7FBD84DD7454600A463F79DAD4BFB21@gmail.com>
Hey Scott. Thanks for the detailed reply. See my response inline below
On Monday, June 20, 2011 at 4:15 PM, M. Scott Marshall wrote: 
> Hi Chime,
> The main reason is that when semantics and natural language are
> inserted into identifiers, some identifers are doomed to become stale
> as thinking evolves or changes about the semantic representation. 
Ok. But that seems like throwing the baby out with the bath water. I understand that inserting meaning into identifiers can open the door for a disconnect between the (possibly evolving) meaning of a term and what the natural language sense of the identifier suggests is the meaning. However, it seems much less draconian to (instead) promulgate best practices that emphasize that the meaning of the term comes from the definitions in the ontology rather than the natural language sense of the identifier. After all, the web architecture axiom that URIs should be opaque is a best practice about not deriving meaning *from* them rather than a mandate that they are truly opaque, and it addresses the same issue. 

The use of ontology annotations (puns?) is also well suited to situations this kind of issue (where some meta information is needed about the terms). Consider, SKOS, for example. In addition (and this is perhaps a bit off topic) ontologies should strive to ensure a large proportion of the terms in them have full definitions (or at least, non-empty definitions - like so many in the biomedical domain do), so users of the ontology do not feel compelled to discern the meaning of the terms from their name. 

My main issue with this approach is the tremendous impact on readability of ontology content. As an example for the impact on readability, consider the rendering of terms via a well-engineered ontology syntax such as the manchester OWL syntax. I have been repeatedly amazed at how readable automatically-generated manchester OWL syntax can be and this restriction we are discussing almost rules out such a quick win. 

> Or when a new 'name brand' is created for that namespace: I think that
> the best example of this was provided by Jonathan Rees for Shared
> Names - ever heard of 'locuslink' identifiers? I believe that Entrez
> Gene occupies the name branding of that space now.This is precisely
> the sort of problem that Shared Names would like to avoid by serving
> (non-ontological) identifiers from a 'neutral namespace'. 
But how is this different from the general problem of identifiers changing over time and can't you use semantic identifiers in 'neutral namespaces' such as purl.org?
> In
> ontologies, the same principle applies (I see that Helena has supplied
> a good example).
> I agree with Mark about proper tooling - the tools should
> automatically display labels. 
It seems to me that this only shifts the complexity (and burden) brought about by the mandate to tools that already have their work - in making interactions with semantic web content user friendly and intuitive - cut out for them. If there was broad, community agreement on which properties provide authoritative human-readable labels, then such tools could easily show these labels. Vocabularies such as SKOS have gone along way in this direction. However, the decision to rule out the use of semantic identifiers seems to make this capability a prerequisite to using them in a user-friendly way.
> It's true that I don't know of a SPARQL
> editor that does this to a satisfying degree yet, (except for one:
> SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
> shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
> reason to create identifiers and your knowledge representation in a
> way that won't stand the test of time.
I think this is part of the larger 'ontology evolution problem' and requiring semantic-free identifiers not only doesn't solve the general problem but it also introduces difficulty in the one area the SW can not afford any more challenges: accessibility and user friendliness 
> Shouldn't we consider RDF to be the bytecode of knowledge? 

Yes, but that shouldn't require that we limit the form of the identifiers. Besides, it already behaves in that way since reasoners already treat identifiers opaquely. The problem being addressed only applies to humans, which is why I'm inclined to think that what is needed is best practices. 
> Although I
> understand the difficulty of dealing with non-human readable
> identifiers in SPARQL and RDF, I believe that we are now looking at
> bytecode and complaining that it isn't human readable.
You can think of it that way, but consider the earlier comment about readability of computer programs. Regardless of whether or not they all end up as machine code, readability *is* still a distinguishing factor between computer programming languages.

>  It's true that,
> until the tools are available, it is difficult to write SPARQL
> queries. But if we applied the same logic to gene accession numbers,
> where would we be now? 
I certainly agree that there are domains where the use of human readable identifiers exacerbates the ontology evolution problem (and gene data is the prime example), but I do not think this is completely representative of all uses of human readable identifiers and thus I think the approach of educating developers and consumers of the nuances that make one scenario more of an issue than another is much more appropriate.

General observation: This seems like another neat v.s. scruffy thread and there seem to be many of these playing out in the various semantic web communities at this time. http-range-14 v.s. ontology-determined meaning of resources, dereferencability of RDF URIs, etc.

Chime Ogbuji
Sent with Sparrow
Received on Monday, 20 June 2011 21:40:55 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 7 January 2015 14:52:47 UTC