Re: Managing Co-reference (Was: A Semantic Elephant?) from Hugh Glaser on 2008-05-18 (semantic-web@w3.org from May 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sun, 18 May 2008 10:57:29 +0100
To: Story Henry <henry.story@bblfish.net>, Semantic Web Interest Group <semantic-web@w3.org>
CC: Ian Millard <icm@ecs.soton.ac.uk>
Message-ID: <C455BC19.249EE%hg@ecs.soton.ac.uk>
Henry's comments on canons brings us to some pragmatic maintenance issues
that seem to have gone unremarked so far.

How do you manage a load of sameAs URIs?
That is, what does an "owl:sameAs group" look like?
So if I have 100 URIs in my "owl:sameAs group", what do I do when I add
another one?
If I established the similarity by looking at the properties by querying
using one of the Group URIs, do I assert sameness with that URI?
But then what happens if someone retracts a URI from the Group that was part
of the sameness linkage.
ie
A=B=C=D
I query for D and find so much property similarity with my new URI E that I
do D=E, where the similar properties are with A.
B is then lost, for whatever reason.
E no longer = A

So should I have asserted E=A, B=A, C=A, D=A, ...?
This would require 100 sameness assertions, involving O(n squared) in
general.
In addition, if I don't do the n squared, then the graph I have is sensitive
to the order in which I find things, which I find rather unclean.

The problem is that there is not a special URI to use against which to
assert the n sameness properties.
One solution is to mint a new URI (an authority?), and use that. This is
absolutely unacceptable - the problem of having too many URIs for something
is not solved by creating yet another a new one!
So the answer is to identify one of the group (we say use the term Bundle)
as a Canon.
Then there is a relatively simple management process than can be undertaken
for merging Bundles (we consider that adding a URI is creating a singleton
Bundle and then merging).
[There is the problem that it is possible to see a current URI of a Bundle,
when the data is seen externally as RDF, but this is a packaging issue, and
the URI is not guaranteed to remain. The contract of the CRS says that it is
given a URI, and will return a CanonURI, with a bunch of sameness relations
to other URIs, one of which is the URI given.
For example:
@prefix owl:  <http://www.w3.org/2002/07/owl#> .
<http://dblp.rkbexplorer.com/id/conf/semweb/OHaraAKS04> owl:sameAs
<http://southampton.rkbexplorer.com/id/eprints-10029> ,
<http://dblp.L3S.de/d2r/resource/publications/conf/semweb/OHaraAKS04> ,
.
]
Provenance (history of merging) can be kept (for unwinding errors, for
example), but that does not need to be part of or visible in the normal
usage.
We in fact take the provenance out of the RDF interface to the CRS
altogether at the moment, but it can be seen in a graphical format (Jim
Hendler in dblp again):
http://dblp.rkbexplorer.com/crs/visualise/?type=bundle&term=http%3A%2F%2Fdbl
p.rkbexplorer.com%2Fcrs%2Fbundle-2013958

Again, all this is irrespective of the actual property used to record
sameness; it is an engineering solution to the problem of managing large
numbers of sameness assertions, while avoiding the requirement of creating
yet another one.

Best
Hugh

On 17/05/2008 19:03, "Story Henry" <henry.story@bblfish.net> wrote:

>
> Finally on the topic of owl:sameAs slowing things down, I was
> wondering how rdf databases can be built to do efficient reasoning
> over these things? Should they have special rules for owl:sameAs by
> for example deciding for every owl:sameAs group a canonnical
> identifier that collects all the merged relationships?
>
> "http://www.w3.org/People/Berners-Lee/card#i" is canonicalURI of
>                     <http://www.w3.org/People/Berners-Lee/card#i>,
>
> <http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee
>> ,
>
> <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007
>> .
>
> So that the DB just puts all the relations on
> <http://www.w3.org/People/Berners-Lee/card#i
>> .
>
> Should that be built right into triple or quad stores? (Does it work
> with quad stores?)
> Is it more efficient?
>
> Henry
>
>
Received on Sunday, 18 May 2008 09:59:13 UTC