Re: {Disarmed} Re: Managing Co-reference (Was: A Semantic Elephant?) from Hugh Glaser on 2008-05-15 (semantic-web@w3.org from May 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Thu, 15 May 2008 20:49:57 +0100
To: Michael F Uschold <uschold@gmail.com>, Aldo Gangemi <aldo.gangemi@cnr.it>
CC: Tim Berners-Lee <timbl@w3.org>, Sören Auer <auer@informatik.uni-leipzig.de>, Semantic Web Interest Group <semantic-web@w3.org>, Chris Bizer <chris@bizer.de>, Frank van Harmelen <Frank.van.Harmelen@cs.vu.nl>, Kingsley Idehen <kidehen@openlinksw.com>, "Fabian M. Suchanek" <f.m.suchanek@gmail.com>, Tim Berners-Lee <timbl@csail.mit.edu>, Jim Hendler <hendler@cs.rpi.edu>, Mark Greaves <markg@vulcan.com>, "georgi.kobilarov@gmx.de" <georgi.kobilarov@gmx.de>, Jens Lehmann <lehmann@informatik.uni-leipzig.de>, Richard Cyganiak <richard@cyganiak.de>, Frederick Giasson <fred@fgiasson.com>, Michael Bergman <mike@mkbergman.com>, Conor Shankey <cshankey@reinvent.com>, Kira Oujonkova <koujonkova@reinvent.com>, "a.o.jaffri@ecs.soton.ac.uk" <a.o.jaffri@ecs.soton.ac.uk>, "icm@ecs.soton.ac.uk" <icm@ecs.soton.ac.uk>
Message-ID: <C4525275.2493B%hg@ecs.soton.ac.uk>

And then I hit the wrong button and sent early - sorry.

Michael,
Many thanks for asking the question.
It is very exciting to see this discussion so active.
I have been trying to get to the front of the messages to say something, but
they just keep coming in!
To answer you email:
Yes, we have an infrastructure (the Consistent Reference Service, CRS) with
which we have been trying to manage co-reference between a bunch of
independent SW sites to allow applications to do what they need. It has gone
through quite a few revisions over the last few years.

Essentially we consider coreference as more knowledge about things, which
can be represented in the SW, and can be used by applications if and when
they see fit. And as someone said, there is no truth, only opinions.
So we need an infrastructure for opinions, but that is the SW.
To answer your specific questions.


On 15/05/2008 00:25, "Michael F Uschold" <uschold@gmail.com> wrote:

> Aldo notes the problems with using owl:sameAs to mean similarity. Such uses
> are often incorrect, and Aldo suggests using something like rdfs:seeAlso,
> skos:related, instead. These relations are too weak, unfortunately.
>
> There is an interesting proposal for managing URI snyonyms that attempts to
> have a middle ground, weaker than owl:sameAs, but much stronger than
> rdfs:seeAlso or skos:related.   They suggest an infrastructural approach
> [apparently] outside the logic for managing URI synonyms. It is a quite clever
> approach, but still has some challenges.  Here are portions of a note I just
> sent the authors of a paper, which relates to this question.
>
> Afraz, Hugh and Ian:
>
> I just read your workshop paper:
> Managing URI Synonymity to Enable Consistent Reference on the Semantic Web
> <http://eprints.ecs.soton.ac.uk/15614/1/camera-ready.pdf>
>
> 1. I wholeheartedly agree that owl:SameAs is too strong in many cases. A
> weaker relation is needed. However, you don't offer weaker relation and give
> it semantics. Instead, you do a kind of sleight of hand and remove it from the
> logic.  Without  a semantics, what is a system developer to do with the fact
> that two URIs are in the same bundle?  What are the inferential impliciations?
The semantics of coref:duplicate for example, is defined in the same way as
most other things on the SW: what does a system developer do with
foaf:knows?
So we have removed it from the logic, but kept it in RDF (because it is
knowledge).
One example of what a system developer might do is use the bundle to assert
owl:sameAs into their RDF cache, so that they can do the inference they
want. So we have separated out the statements that someone (anyone of
course) can make about URIs, from what a particular system might choose to
do with the knowledge.
For example, if you resolve
http://dblp.rkbexplorer.com/id/people-d7ea883648d513828ecea43556f1848a-15f88
dc6d4eaf3b08d1de482b480c4c9
You get (303ed to) a browsing page, where you can see the
resist:coreferenceData (available in the resolved URI) for this URI by
clicking "View in CRS".
(We should be doing 303 for the CRS as well, but haven't had time yet.)
At the bottom you will se that an agent can import this as ntriples
owl:sameAs - we have nothing against owl:sameAs, if that is what the agent
wants to do, but the inference decision can be up to them.
When I put owl:sameAs in my KB, I mean something much stronger.
> 2.
> 3. Example: IMHO it is a bad idea to say that Spain the political entity is
> the same as Spain the geopolicial region. This ontological distinction has
> been clear documented in DOLCE, for example. They are different, and should
> have different URIs.  Conflating them will cause problems.  Of course, making
> this and many other ontologically 'sound' distinctions can cause its own
> problems, by adding complexity -- a tradeoff. Without any semantics of
> inCRS_Bundle, there is no way to tell if it is semantically correct.
I suspect we have a pretty strong agreement here.
I am trying to decide if it is a bad example, and how to elaborate.
There is no absolute truth about whether two things are coreferent. It will
usually be a Bad Thing to conflate such URIs. But if I was providing a SW
application that was giving a unified view of the source of fruit from a
large number of sources, and some sites were using geonames, and others
dbpedia, perhaps it would be appropriate. An alternative would be to do some
ontology mapping of political to geopolitical, but I think that would give
me greater indigestion. And of course, it is reasonable to use a CRS to
represent ontology mappings, by the way.
So the point is that the publisher of a CRS is making statements you may or
may not accept, with the attendant issues of trust, context, methodology of
process, etc, and the user has to decide if they are fit for their purpose.
In fact, this is no different from a normal publication of RDF.
So when we understand how to represent these things as metadata of the RDF,
we may have the solution to the problem.
> 4. Do you have any idea of the scalability of this approach?
It has been designed to avoid scalability problems.
Thus in particular there is no central authority.
I have not graphed any data.
We used to have an application (http://resist.ecs.soton.ac.uk/explorer/ )
that used one KB (with about 50M triples) with an associated CRS; we then
moved to a new application (http://www.rkbexplorer.com/explorer/ ) using
this data exploded into more than 20 KBs with associated CRSes where
necessary.
We do seem to have an improvement in the performance of a system that is
making serious numbers of queries. And now, since it only looks at the KBs
and CRSes that it requires, there is no problem with having more data around
(indeed we have another few KBs of more than 70M triples that do not
interfere).

Sorry to be so long, but I hope that I have gone some way to answering your
questions.
It was very kind of you to ask.
Hugh

--
Hugh Glaser,  Reader
              Dependable Systems & Software Engineering
              School of Electronics and Computer Science,
              University of Southampton,
              Southampton SO17 1BJ
Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
http://www.ecs.soton.ac.uk/~hg/



> Michael
>
>
>
> On Wed, May 14, 2008 at 2:24 PM, Aldo Gangemi <aldo.gangemi@cnr.it> wrote:
>>>         EUR Problem 2) even if you can find the links, prolific use of
>>> owl:sameAs will create computational problems.
>>>
>>
>>
>> Michael,
>>
>> there is an item related to Problem 2), already discussed on LOD and
>> elsewhere last year, i.e. the use of
>> owl:sameAs, which is a formal relation of identity, to denote generic
>> "similarity", or even "relatedness"
>> between two entities.
>>
>> owl:sameAs is great to co-reference persons, places, etc. It is buggy when
>> used to relate e.g. foaf:Person
>> instances to persons' homepages, or a city as from Cyc to a wikipedia article
>> of that city (as done in DBpedia).
>>
>> In previous discussions, besides some weak good practices [1], I found no
>> attempt to discourage its use for similarity.
>> This use is not needed. We can use e.g. rdfs:seeAlso, skos:related, or any
>> other local relation instead.
>>
>> It is reasonable, as Richard Cyganiak wrote at the time, that we have to work
>> around the quirks [2],
>> nonetheless, if there is no real need, why should we work around the quirks
>> caused by a pointless identity
>> assumption?
>>
>> Notice that ignoring owl:sameAs is not a good solution. We need some
>> trade-off between simplicity
>> and formality. A basic similarity relation is perfect, and then those triples
>> can be worked out automatically,
>> by means of appropriate metamodels, e.g. as proposed in [3].
>>
>> Aldo
>>
>> [1] Bernard Vatant suggested some good practice of mutual linking:
>>
http://universimmedia.blogspot.com/2007/07/using-owlsameas-in-linked-data.htm>>
l
>>
>> [2] Cyganiak quote:
>>> People who want to re-use your data will learn to work around its quirks and
>>> idiosyncrasies.
>>> Dealing with the quirks is a part of re-using data, it always was, and it
>>> always will be.
>>>
>>
>> [3] MailScanner has detected definite fraud in the website at
>> "www.ibiblio.org". Do not trust this website:
>> http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf
>> <http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf>  from IRW workshop:
>> MailScanner has detected definite fraud in the website at "www.ibiblio.org".
>> Do not trust this website: http://www.ibiblio.org/hhalpin/irw2006/
>> <http://www.ibiblio.org/hhalpin/irw2006/>
>>
>>
>> _________________________________
>>
>> Aldo Gangemi
>>
>> Senior Researcher
>> Laboratory for Applied Ontology
>> Institute for Cognitive Sciences and Technology
>> National Research Council (ISTC-CNR)
>> Via Nomentana 56, 00161, Roma, Italy
>> Tel: +390644161535
>> Fax: +390644161513
>> aldo.gangemi@cnr.it
>>
>> http://www.loa-cnr.it/gangemi.html
>>
>> icq# 108370336
>>
>> skype aldogangemi
>>
>>
>>
>>

Received on Thursday, 15 May 2008 19:54:22 UTC