Re: Managing Co-reference (Was: A Semantic Elephant?) from Hugh Glaser on 2008-05-17 (semantic-web@w3.org from May 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Sat, 17 May 2008 23:48:47 +0100
To: Story Henry <henry.story@bblfish.net>, Semantic Web Interest Group <semantic-web@w3.org>
Message-ID: <C4551F5F.249DF%hg@ecs.soton.ac.uk>
On 17/05/2008 19:03, "Story Henry" <henry.story@bblfish.net> wrote:

>
> On 15 May 2008, at 21:30, Aldo Gangemi wrote:
>> Issue 1: managing to suggest the rationale of owl:sameAs
>> appropriately, i.e. in a harmless way for future usages (Aldo,
>> Michael)
>> Issue 2: distinguishing "data provision" vs. "representational"
>> usages of owl:sameAs (Yves)
>> Issue 3: need for another operator, e.g. representing equality under
>> a closed set of properties (Geoff, Harry), or some relaxed
>> rdfs:sameAs (Jim)
>>  Issue 3a: using another existing relation, such as skos:related or
>> rdfs:seeAlso, but these are either too weak (rdfs:seeAlso), or
>> constrained (skos:related)
>> Issue 4: need for a semiotic grasp over co-reference, maybe outside
>> formal semantics (Bernard, Peter)
>
> I think you missed Jeremy Caroll's suggestion that at times you need
> to decide what owl:sameAs you trust and apply different one's under
> different circumstances. So if you know someone is publishing data,
> and they can't conceptually distinguish between spain the geopolitical
> region and spain the political entity then you can apply certain
> relations. In fact sometimes you may even have to decide that some
> people confuse two relations and decide to apply some rule to their
> utterances.
>
> CONSTRUCT { ?a rdfs:seeAlso ?b }
> WHERE {
>     GRAPH ?g { ?a owl:sameAs ?b . }
>     ?g said:by :george .
> }
>
> It is I think inescapable that we will need such rules. But of course
> we should try as much as possible to develop logics that avoid the
> need for them.
This is the logic behind separating out the information (whatever relation
you decide to use) into separate services (in our case a CRSes), rather than
embedding in the same triplestore. Applications can then choose which
service to use, based on criteria associated with the CRS, such as what the
policy was for asserting coreference.
>
> It would be good to have a few clear examples. I like the Berlin one.
> How should one relate the following things
>
> <http://sws.geonames.org/2950159/>
> <http://sws.geonames.org/2950157/>
> <http://dbpedia.org/resource/Berlin>
>
> Are rules the only way to go about it?
>
> The criticism of dbpedia is I think wrong at two points:
>    - they don't confuse documents and resources
>    - if they make mistakes linking resources from different vocabs
> that is something that can be corrected. (It would help if dbpedia
> were editable like wikipedia)
>
> Finally on the topic of owl:sameAs slowing things down, I was
> wondering how rdf databases can be built to do efficient reasoning
> over these things? Should they have special rules for owl:sameAs by
> for example deciding for every owl:sameAs group a canonnical
> identifier that collects all the merged relationships?
>
> "http://www.w3.org/People/Berners-Lee/card#i" is canonicalURI of
>                     <http://www.w3.org/People/Berners-Lee/card#i>,
>
> <http://www4.wiwiss.fu-berlin.de/bookmashup/persons/Tim+Berners-Lee
>> ,
>
> <http://www4.wiwiss.fu-berlin.de/dblp/resource/person/100007
>> .
>
> So that the DB just puts all the relations on
> <http://www.w3.org/People/Berners-Lee/card#i
>> .
We use such a canon in the CRS (using coref:hasCanon for the bundle).
So
http://dblp.rkbexplorer.com/crs/export/?term=http%3A%2F%2Fdblp.rkbexplorer.c
om%2Fid%2Fpeople-781c2b6bcc6be8cab265a0c30c0129c0-e301b92531a624f74c49f50881
f118f6&type=uri
which is the CRS entry for Jim Hendler in our dblp, has many URIs, because
we need to assert separate URIs initially in the store, and then use the
knowledge in the store to do all our coreferent tool work.
(Doing coreferent work using the knowledge in the triplestore is much more
effective than trying to do it with the limited knowledge that is available
during assertion, but this requires a huge number of URIs.)
Of course, once this is done, it makes sense to feed this back to the
assertion process, to collapse them all to the canon for this store, or even
a canon for another store.
So Jim's bundle in the CRS associated with another of our stores
http://citeseer.rkbexplorer.com/crs/export/?term=http%3A%2F%2Fciteseer.rkbex
plorer.com%2Fid%2Fresource-CSP127882&type=uri
has a citeseer URI for the canon.
Choice of the canon is another aspect of the policy of a CRS, but by asking
the dblp CRS what is the canon for a citeseer URI, the citeseer store could
choose to use the dblp canon.
It is thus possibly for an accepted URI canon to gradually gain agreement,
without any need for centralised authority, avoiding the associated social
and performance problems.
>
> Should that be built right into triple or quad stores? (Does it work
> with quad stores?)
> Is it more efficient?
>
> Henry
>
>
Received on Saturday, 17 May 2008 22:50:32 UTC