Re: blog: semantic dissonance in uniprot from David Booth on 2009-03-26 (public-semweb-lifesci@w3.org from March 2009)

From: David Booth <david@dbooth.org>
Date: Thu, 26 Mar 2009 07:01:04 -0400
To: Phillip Lord <phillip.lord@newcastle.ac.uk>
Cc: W3C HCLSIG hcls <public-semweb-lifesci@w3.org>
Message-Id: <1238065264.27539.3276.camel@nc6000.w3.org>
On Wed, 2009-03-25 at 21:34 -0400, David Booth wrote:
> [ . . . ] the important criterion for using owl:sameAs are: (a)
> in *your* RDF graph the two terms are intended to denote the *same*
> individual; and (b) your RDF graph is consisistent with their
> definitions.  [ . . . . ]

After writing the above I realized that it may sound like I am saying
that it is okay to use owl:sameAs indiscriminately in cases where a
weaker assertion would do.  As many have pointed out owl:sameAs is
likely to cause problems when graphs are merged, which I illustrated:
http://lists.w3.org/Archives/Public/public-semweb-lifesci/2009Mar/0169.html
For example, graphs G1 and G2 individually use owl:sameAs without
problem, but they are inconsistent when merged.   

So, some amended points:

1. This problem is not unique to owl:sameAs.  It is rooted in the fact
that terms are ambiguous and different graphs may constrain their
interpretations in different, mutually exclusive ways.  However, it is
much more likely to show up when using owl:sameAs than with other
predicates.  Thus, the cautions you have heard about using owl:sameAs
are well founded.

2. To increase the chances of your graph being merged with other graphs
without inconsistency, you should consider not only whether two terms
are intended to denote the same individual in *your* graph, but whether
they are also likely to denote the same individual in the merge of your
graph with someone else's graph.  Admittedly, this may be impossible to
predict.  So in choosing to use owl:sameAs, you should recognize that
you are making a decision that will limit the scope of graphs that can
be painlessly combined with yours.  As pointed out in #1 above,
owl:sameAs is not the only assertion in your graph with this effect, but
owl:sameAs happens to do it much more powerfully than most other
assertions, so you should use a weaker assertion if you can.

3. I think a weaker version of owl:sameAs that is scoped to a particular
named graph may help -- an owl:scopedSameAs, for example.  So if one
says:

  :g1 owl:scopedSameAs ( :apple1 :apple2 ) .
  :g2 owl:scopedSameAs ( :apple1 :apple3 ) .

then :apple1 and :apple2 denote the same resource in the context of
named graph :g1, and :apple1 and :apple3 denote the same resource in the
context of named graph :g2, but if :g3 is the merge of :g1 and :g2, then
we effectively get two new blank nodes:

  _:apple12 s:isNarrowerThanResource :apple1 .
  _:apple12 s:isNarrowerThanResource :apple2 .

  _:apple13 s:isNarrowerThanResource :apple1 .
  _:apple13 s:isNarrowerThanResource :apple3 .

plus all the triples of :g1 and :g2.  s:isNarrowerThanResource is
defined here
http://dbooth.org/2007/splitting/#isNarrowerThanResource
and it may not be *quite* what we want, but I think something along
those lines would be.  The idea is that in a statement like

  _:apple12 s:isNarrowerThanResource :apple1 .

the s:isNarrowerThanResource predicate should indicate that the set of
interpretations for _:apple12 is a *subset* of the set of
interpretations for :apple1.  Thus, the pair of statements

  _:apple12 s:isNarrowerThanResource :apple1 .
  _:apple12 s:isNarrowerThanResource :apple2 .

has an effect that is similar to saying

  :apple1 owl:sameAs :apple2 .

except that it is more limited in scope.  This idea is illustrated in my
slides 16-18 from last year's workshop on identity and reference:
http://dbooth.org/2008/irsw/slides.ppt

4. Notice that this issue is far more like to arise when you are talking
about *individuals* -- not classes.  In essence, you want to be able to
restrict the set of interpretations to a particular subset.  But if
you're talking about a class, then you already have an easy way to do
that: rdfs:subClassOf.  This is one of the reasons why some people say
that you should model almost everything as classes rather than
individuals.

David Booth
Received on Thursday, 26 March 2009 11:01:41 UTC