Re: My task from last week: Semantic free identifiers from Andrea Splendiani on 2011-06-20 (public-semweb-lifesci@w3.org from June 2011)

From: Andrea Splendiani <andrea.splendiani@bbsrc.ac.uk>
Date: Mon, 20 Jun 2011 21:46:55 +0000
To: Chime Ogbuji <chimezie@gmail.com>
Cc: "M. Scott Marshall" <mscottmarshall@gmail.com>, "andrea splendiani (RRes-Roth)" <andrea.splendiani@rothamsted.ac.uk>, "MMVagnoni@mdanderson.org" <MMVagnoni@mdanderson.org>, James Malone <malone@ebi.ac.uk>, HCLS <public-semweb-lifesci@w3.org>, Jonathan Rees <jar@creativecommons.org>
Message-ID: <339406500966331@jngomktg.net>
Il giorno 20/giu/2011, alle ore 22.40, Chime Ogbuji ha scritto:

> Hey Scott.  Thanks for the detailed reply.  See my response inline below
> On Monday, June 20, 2011 at 4:15 PM, M. Scott Marshall wrote:
> 
>> Hi Chime,
>> 
>> The main reason is that when semantics and natural language are
>> inserted into identifiers, some identifers are doomed to become stale
>> as thinking evolves or changes about the semantic representation.
> Ok.  But that seems like throwing the baby out with the bath water.  I
understand that inserting meaning into identifiers can open the door for a
disconnect between the (possibly evolving) meaning of a term and what the
natural language sense of the identifier suggests is the meaning.  However,
it seems much less draconian to (instead) promulgate best practices that
emphasize that the meaning of the term comes from the definitions in the
ontology rather than the natural language sense of the identifier.
+1
the problems are missing (good) definitions, not 'opaque' identifiers. 
Can't do much different if the URI is xxx:0001... and the defiinition 'a
Protein'.

best,
Andrea

>  After all, the web architecture axiom that URIs should be opaque is a
best practice about not deriving meaning *from* them rather than a mandate
that they are truly opaque, and it addresses the same issue. 
> 
> The use of ontology annotations (puns?) is also well suited to situations
this kind of issue (where some meta information is needed about the terms). 
Consider, SKOS, for example.  In addition (and this is perhaps a bit off
topic) ontologies should strive to ensure a large proportion of the terms in
them have full definitions (or at least, non-empty definitions - like so
many in the biomedical domain do), so users of the ontology do not feel
compelled to discern the meaning of the terms from their name. 
> 
> My main issue with this approach is the tremendous impact on readability
of ontology content.  As an example for the impact on readability, consider
the rendering of terms via a well-engineered ontology syntax such as the
manchester OWL syntax.  I have been repeatedly amazed at how readable
automatically-generated manchester OWL syntax can be and this restriction we
are discussing almost rules out such a quick win.  
> 
>> Or when a new 'name brand' is created for that namespace: I think that
>> the best example of this was provided by Jonathan Rees for Shared
>> Names - ever heard of 'locuslink' identifiers? I believe that Entrez
>> Gene occupies the name branding of that space now.This is precisely
>> the sort of problem that Shared Names would like to avoid by serving
>> (non-ontological) identifiers from a 'neutral namespace'.
> But how is this different from the general problem of identifiers changing
over time and can't you use semantic identifiers in 'neutral namespaces'
such as purl.org?
>> In
>> ontologies, the same principle applies (I see that Helena has supplied
>> a good example).
>> 
>> I agree with Mark about proper tooling - the tools should
>> automatically display labels.
> It seems to me that this only shifts the complexity (and burden) brought
about by the mandate to tools that already have their work - in making
interactions with semantic web content user friendly and intuitive - cut out
for them.  If there was broad, community agreement on which properties
provide authoritative human-readable labels, then such tools could easily
show these labels.  Vocabularies such as SKOS have gone along way in this
direction.  However, the decision to rule out the use of semantic
identifiers seems to make this capability a prerequisite to using them in a
user-friendly way.
>> It's true that I don't know of a SPARQL
>> editor that does this to a satisfying degree yet, (except for one:
>> SPARQL Assist Lanugage-Neutral Query Composer from McCarty et al,
>> shown at SWAT4LS in Berlin :) See Mark's post.) but that is not a
>> reason to create identifiers and your knowledge representation in a
>> way that won't stand the test of time.
> I think this is part of the larger 'ontology evolution problem' and
requiring semantic-free identifiers not only doesn't solve the general
problem but it also introduces difficulty in the one area the SW can not
afford any more challenges: accessibility and user friendliness 
>> Shouldn't we consider RDF to be the bytecode of knowledge?
> Yes, but that shouldn't require that we limit the form of the identifiers.
 Besides, it already behaves in that way since reasoners already treat
identifiers opaquely.  The problem being addressed only applies to humans,
which is why I'm inclined to think that what is needed is best practices.  
>> Although I
>> understand the difficulty of dealing with non-human readable
>> identifiers in SPARQL and RDF, I believe that we are now looking at
>> bytecode and complaining that it isn't human readable.
> You can think of it that way, but consider the earlier comment about
readability of computer programs.  Regardless of whether or not they all end
up as machine code, readability *is* still a distinguishing factor between
computer programming languages.
>> It's true that,
>> until the tools are available, it is difficult to write SPARQL
>> queries. But if we applied the same logic to gene accession numbers,
>> where would we be now?
> I certainly agree that there are domains where the use of human readable
identifiers exacerbates the ontology evolution problem (and gene data is the
prime example), but I do not think this is completely representative of all
uses of human readable identifiers and thus I think the approach of
educating developers and consumers of the nuances that make one scenario
more of an issue than another is much more appropriate.
> General observation: This seems like another neat v.s. scruffy thread and
there seem to be many of these playing out in the various semantic web
communities at this time. http-range-14 v.s. ontology-determined meaning of
resources, dereferencability of RDF URIs, etc.
>  
> -- 
> Chime Ogbuji
> Sent with Sparrow
> 

Andrea Splendiani
Senior Bioinformatics Scientist
Centre for Mathematical and Computational Biology
+44(0)1582 763133 ext 2004
andrea.splendiani@bbsrc.ac.uk
Received on Monday, 20 June 2011 21:47:53 UTC