Re: blog: semantic dissonance in uniprot

On Wed, 2009-03-25 at 10:41 +0000, Phillip Lord wrote:
> [ . . . ]
> We need some tags which say "these two database records
> are about the same protein, well, sort of, at least in this case, for
> the purposes of what I am doing".

Well, if you have URIs for both of these proteins, such as :protein1
and :protein2, then that is exactly what owl:sameAs is for:

	@prefix : <http://example#> .
	:protein1 owl:sameAs :protein2 .

(I'm assuming you are treating these proteins as individuals, since
owl:sameAs is for individuals rather than classes.)  It doesn't matter
that the assertion may not hold in the context of someone else's RDF
graph.  What matters is that the assertion is consistent with the
identities of :protein1 and :protein2.  But what *are* the identities
of :protein1 and :protein2?  In general they will be ambiguous, which
means that they will admit to multiple interpretations. 

Remember that in RDF semantics, an interpretation provides a mapping
from URIs to resources, and in general there may be many interpretations
for a given RDF graph, each potentially mapping the same URI to a
different thing in your domain.  

It is not usually possible to nail down the definition of a term such
that there is only one possible interpretation.  Therefore, if you think
of resource identity in terms of sets of possible interpretations (i.e.,
interpretations that are consistent with a given graph), then the effect
of asserting owl:sameAs is to restrict the set of possible
interpretations such that :protein1 and protein2 denote the same
resource.  

Bottom line: If you are using :protein1 and :protein2 in a manner that
is consistent with their definitions, and you want to say that in *your*
data, they denote the same individual, then owl:sameAs is exactly what
you want.  Of course, this begs the question: What *are* the definitions
of :protein1 and :protein2?  

Unfortunately, semantic web architecture does not yet have a universally
accepted way to indicate such definitions, though one might contend that
rdfs:isDefinedBy is a start:
http://www.w3.org/TR/rdf-schema/#ch_isdefinedby 
The notion of a "URI declaration"
http://dbooth.org/2007/uri-decl/
attempts to address this need by recognizing a mechanism by which a URI
can be associated with a set of "core assertions" whose purpose is to
precisely constrain the set of interpretations that are permissible when
using that URI, thus effectively "defining" that URI.

But regardless of how :protein1 and :protein2 are defined, if you assume
that multiple interpretations will generally be consistent with their
definitions, then the important criterion for using owl:sameAs are: (a)
in *your* RDF graph the two terms are intended to denote the *same*
individual; and (b) your RDF graph is consisistent with their
definitions.  In particular, even if my graph is also (by itself)
consistent with the term definitions, there is no requirement that the
*merge* of your graph and my graph also be consistent with the term
definitions, because your graph and my graph may restrict the set of
possible interpretations in different and mutually exclusive ways.

David Booth

Received on Thursday, 26 March 2009 01:34:50 UTC