- From: Hugh Glaser <hg@ecs.soton.ac.uk>
- Date: Sun, 18 May 2008 10:57:29 +0100
- To: Story Henry <henry.story@bblfish.net>, Semantic Web Interest Group <semantic-web@w3.org>
- CC: Ian Millard <icm@ecs.soton.ac.uk>
Henry's comments on canons brings us to some pragmatic maintenance issues that seem to have gone unremarked so far. How do you manage a load of sameAs URIs? That is, what does an "owl:sameAs group" look like? So if I have 100 URIs in my "owl:sameAs group", what do I do when I add another one? If I established the similarity by looking at the properties by querying using one of the Group URIs, do I assert sameness with that URI? But then what happens if someone retracts a URI from the Group that was part of the sameness linkage. ie A=B=C=D I query for D and find so much property similarity with my new URI E that I do D=E, where the similar properties are with A. B is then lost, for whatever reason. E no longer = A So should I have asserted E=A, B=A, C=A, D=A, ...? This would require 100 sameness assertions, involving O(n squared) in general. In addition, if I don't do the n squared, then the graph I have is sensitive to the order in which I find things, which I find rather unclean. The problem is that there is not a special URI to use against which to assert the n sameness properties. One solution is to mint a new URI (an authority?), and use that. This is absolutely unacceptable - the problem of having too many URIs for something is not solved by creating yet another a new one! So the answer is to identify one of the group (we say use the term Bundle) as a Canon. Then there is a relatively simple management process than can be undertaken for merging Bundles (we consider that adding a URI is creating a singleton Bundle and then merging). [There is the problem that it is possible to see a current URI of a Bundle, when the data is seen externally as RDF, but this is a packaging issue, and the URI is not guaranteed to remain. The contract of the CRS says that it is given a URI, and will return a CanonURI, with a bunch of sameness relations to other URIs, one of which is the URI given. For example: @prefix owl: <http://www.w3.org/2002/07/owl#> . <http://dblp.rkbexplorer.com/id/conf/semweb/OHaraAKS04> owl:sameAs <http://southampton.rkbexplorer.com/id/eprints-10029> , <http://dblp.L3S.de/d2r/resource/publications/conf/semweb/OHaraAKS04> , . ] Provenance (history of merging) can be kept (for unwinding errors, for example), but that does not need to be part of or visible in the normal usage. We in fact take the provenance out of the RDF interface to the CRS altogether at the moment, but it can be seen in a graphical format (Jim Hendler in dblp again): http://dblp.rkbexplorer.com/crs/visualise/?type=bundle&term=http%3A%2F%2Fdbl p.rkbexplorer.com%2Fcrs%2Fbundle-2013958 Again, all this is irrespective of the actual property used to record sameness; it is an engineering solution to the problem of managing large numbers of sameness assertions, while avoiding the requirement of creating yet another one. Best Hugh On 17/05/2008 19:03, "Story Henry" <henry.story@bblfish.net> wrote: > > Finally on the topic of owl:sameAs slowing things down, I was > wondering how rdf databases can be built to do efficient reasoning > over these things? Should they have special rules for owl:sameAs by > for example deciding for every owl:sameAs group a canonnical > identifier that collects all the merged relationships? > > "http://www.w3.org/People/Berners-Lee/card#i" is canonicalURI of > <http://www.w3.org/People/Berners-Lee/card#i>, > > <http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee >> , > > <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007 >> . > > So that the DB just puts all the relations on > <http://www.w3.org/People/Berners-Lee/card#i >> . > > Should that be built right into triple or quad stores? (Does it work > with quad stores?) > Is it more efficient? > > Henry > >
Received on Sunday, 18 May 2008 09:59:13 UTC