Re: {Disarmed} Re: Managing Co-reference (Was: A Semantic Elephant?) from Hugh Glaser on 2008-05-15 (semantic-web@w3.org from May 2008)

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Fri, 16 May 2008 00:02:27 +0100
To: "semantic-web@w3.org" <semantic-web@w3.org>
Message-ID: <C4527F93.2495C%hg@ecs.soton.ac.uk>
On 15/05/2008 23:00, "Bernard Vatant" <bernard.vatant@mondeca.com> wrote:

>
>
> [again, crying along with Jim - no more cc's]
>
> Hugh
>
> I think the CRS approach hits a point, which I must confess I did not
> completely get when reviewing your paper a while ago for LDOW.
Such things are always the authors' fault.
> Co-reference is something outside semantics of URIs, so it might be the
> case that it has to be managed at the applicative level.
> Considering this from a semiotic viewpoint, one could say that :
>
> 1. A URI is a signifier ("signifiant")
> 2. The assertions using this URI convey the signified ("signifié") -
> what this URI means
> 3. The URI referent belongs outside the language. RDF cannot make an
> exception to this general rule.
Appealing to what is already known of the nature of language is, I think, a
good thing. We are, after all, trying to define a new language, and how it
relates to existing languages. I dare say that Wittgenstein may be relevant
here, if I could understand him - we do use the word ontology quite a lot,
after all.
I view this from a semiotic viewpoint, if I understand correctly what that
means.
I especially agree with your point 2 in particular; the assertions using the
URI convey the signified, and in fact in a world of SW agents, there is
nothing else that can be used. Of course different sources may make
different assertions (thereby potentially identifying different signifieds).
Often I will look at the assertions provided by the originator of the URI
(resolver), and use that to try to gain some unity of view of what the
signifier is.
[I am not certain I understand the "URI referent"; is it the URI as
signifier?]
>
> owl:sameAs semantics, with this vocabulary, is that two URIs have the
> same signified (hence certainly the same referent).
And you have to be pretty confident, such as being responsible for
generating the URIs themselves, before you can comfortably make that
statement in a global context.
> But what we are looking for in this discussion is a mechanism to handle
> similarity of the referent, which does not imply sameness of the signified.
> This cannot be achieved by adding extra assertions, since those will
> only add to the signified.
Possibly. But I can't quite see the usefulness of making a statement that 2
signifiers were the same, if I did not also intend to mean the 2 signified
things were the same.
>
> So maybe co-reference mechanism is to be set outside formal semantics.
I think this might be going back to the question of the metadata of an RDF
source (as my CRS data is "just" an RDF source making statements about
coreference).
So the simple co-reference mechanism can stay within the RDF, but we need to
be able to assert properties of the RDF itself. (Which might require
reflection, etc.)
>
> Bernard
>
>
> Hugh Glaser a écrit :
>> And then I hit the wrong button and sent early - sorry.
>>
>> Michael,
>> Many thanks for asking the question.
>> It is very exciting to see this discussion so active.
>> I have been trying to get to the front of the messages to say something, but
>> they just keep coming in!
>> To answer you email:
>> Yes, we have an infrastructure (the Consistent Reference Service, CRS) with
>> which we have been trying to manage co-reference between a bunch of
>> independent SW sites to allow applications to do what they need. It has gone
>> through quite a few revisions over the last few years.
>>
>> Essentially we consider coreference as more knowledge about things, which
>> can be represented in the SW, and can be used by applications if and when
>> they see fit. And as someone said, there is no truth, only opinions.
>> So we need an infrastructure for opinions, but that is the SW.
>> To answer your specific questions.
>>
>>
>> On 15/05/2008 00:25, "Michael F Uschold" <uschold@gmail.com> wrote:
>>
>>
>>> Aldo notes the problems with using owl:sameAs to mean similarity. Such uses
>>> are often incorrect, and Aldo suggests using something like rdfs:seeAlso,
>>> skos:related, instead. These relations are too weak, unfortunately.
>>>
>>> There is an interesting proposal for managing URI snyonyms that attempts to
>>> have a middle ground, weaker than owl:sameAs, but much stronger than
>>> rdfs:seeAlso or skos:related.   They suggest an infrastructural approach
>>> [apparently] outside the logic for managing URI synonyms. It is a quite
>>> clever
>>> approach, but still has some challenges.  Here are portions of a note I just
>>> sent the authors of a paper, which relates to this question.
>>>
>>> Afraz, Hugh and Ian:
>>>
>>> I just read your workshop paper:
>>> Managing URI Synonymity to Enable Consistent Reference on the Semantic Web
>>> <http://eprints.ecs.soton.ac.uk/15614/1/camera-ready.pdf>
>>>
>>> 1. I wholeheartedly agree that owl:SameAs is too strong in many cases. A
>>> weaker relation is needed. However, you don't offer weaker relation and give
>>> it semantics. Instead, you do a kind of sleight of hand and remove it from
>>> the
>>> logic.  Without  a semantics, what is a system developer to do with the fact
>>> that two URIs are in the same bundle?  What are the inferential
>>> impliciations?
>>>
>> The semantics of coref:duplicate for example, is defined in the same way as
>> most other things on the SW: what does a system developer do with
>> foaf:knows?
>> So we have removed it from the logic, but kept it in RDF (because it is
>> knowledge).
>> One example of what a system developer might do is use the bundle to assert
>> owl:sameAs into their RDF cache, so that they can do the inference they
>> want. So we have separated out the statements that someone (anyone of
>> course) can make about URIs, from what a particular system might choose to
>> do with the knowledge.
>> For example, if you resolve
>> http://dblp.rkbexplorer.com/id/people-d7ea883648d513828ecea43556f1848a-15f88
>> dc6d4eaf3b08d1de482b480c4c9
>> You get (303ed to) a browsing page, where you can see the
>> resist:coreferenceData (available in the resolved URI) for this URI by
>> clicking "View in CRS".
>> (We should be doing 303 for the CRS as well, but haven't had time yet.)
>> At the bottom you will se that an agent can import this as ntriples
>> owl:sameAs - we have nothing against owl:sameAs, if that is what the agent
>> wants to do, but the inference decision can be up to them.
>> When I put owl:sameAs in my KB, I mean something much stronger.
>>
>>> 2.
>>> 3. Example: IMHO it is a bad idea to say that Spain the political entity is
>>> the same as Spain the geopolicial region. This ontological distinction has
>>> been clear documented in DOLCE, for example. They are different, and should
>>> have different URIs.  Conflating them will cause problems.  Of course,
>>> making
>>> this and many other ontologically 'sound' distinctions can cause its own
>>> problems, by adding complexity -- a tradeoff. Without any semantics of
>>> inCRS_Bundle, there is no way to tell if it is semantically correct.
>>>
>> I suspect we have a pretty strong agreement here.
>> I am trying to decide if it is a bad example, and how to elaborate.
>> There is no absolute truth about whether two things are coreferent. It will
>> usually be a Bad Thing to conflate such URIs. But if I was providing a SW
>> application that was giving a unified view of the source of fruit from a
>> large number of sources, and some sites were using geonames, and others
>> dbpedia, perhaps it would be appropriate. An alternative would be to do some
>> ontology mapping of political to geopolitical, but I think that would give
>> me greater indigestion. And of course, it is reasonable to use a CRS to
>> represent ontology mappings, by the way.
>> So the point is that the publisher of a CRS is making statements you may or
>> may not accept, with the attendant issues of trust, context, methodology of
>> process, etc, and the user has to decide if they are fit for their purpose.
>> In fact, this is no different from a normal publication of RDF.
>> So when we understand how to represent these things as metadata of the RDF,
>> we may have the solution to the problem.
>>
>>> 4. Do you have any idea of the scalability of this approach?
>>>
>> It has been designed to avoid scalability problems.
>> Thus in particular there is no central authority.
>> I have not graphed any data.
>> We used to have an application (http://resist.ecs.soton.ac.uk/explorer/ )
>> that used one KB (with about 50M triples) with an associated CRS; we then
>> moved to a new application (http://www.rkbexplorer.com/explorer/ ) using
>> this data exploded into more than 20 KBs with associated CRSes where
>> necessary.
>> We do seem to have an improvement in the performance of a system that is
>> making serious numbers of queries. And now, since it only looks at the KBs
>> and CRSes that it requires, there is no problem with having more data around
>> (indeed we have another few KBs of more than 70M triples that do not
>> interfere).
>>
>> Sorry to be so long, but I hope that I have gone some way to answering your
>> questions.
>> It was very kind of you to ask.
>> Hugh
>>
>> --
>> Hugh Glaser,  Reader
>>               Dependable Systems & Software Engineering
>>               School of Electronics and Computer Science,
>>               University of Southampton,
>>               Southampton SO17 1BJ
>> Work: +44 (0)23 8059 3670, Fax: +44 (0)23 8059 3045
>> Mobile: +44 (0)78 9422 3822, Home: +44 (0)23 8061 5652
>> http://www.ecs.soton.ac.uk/~hg/
>>
>>
>>
>>
>>> Michael
>>>
>>>
>>>
>>> On Wed, May 14, 2008 at 2:24 PM, Aldo Gangemi <aldo.gangemi@cnr.it> wrote:
>>>
>>>>>         EUR Problem 2) even if you can find the links, prolific use of
>>>>> owl:sameAs will create computational problems.
>>>>>
>>>>>
>>>> Michael,
>>>>
>>>> there is an item related to Problem 2), already discussed on LOD and
>>>> elsewhere last year, i.e. the use of
>>>> owl:sameAs, which is a formal relation of identity, to denote generic
>>>> "similarity", or even "relatedness"
>>>> between two entities.
>>>>
>>>> owl:sameAs is great to co-reference persons, places, etc. It is buggy when
>>>> used to relate e.g. foaf:Person
>>>> instances to persons' homepages, or a city as from Cyc to a wikipedia
>>>> article
>>>> of that city (as done in DBpedia).
>>>>
>>>> In previous discussions, besides some weak good practices [1], I found no
>>>> attempt to discourage its use for similarity.
>>>> This use is not needed. We can use e.g. rdfs:seeAlso, skos:related, or any
>>>> other local relation instead.
>>>>
>>>> It is reasonable, as Richard Cyganiak wrote at the time, that we have to
>>>> work
>>>> around the quirks [2],
>>>> nonetheless, if there is no real need, why should we work around the quirks
>>>> caused by a pointless identity
>>>> assumption?
>>>>
>>>> Notice that ignoring owl:sameAs is not a good solution. We need some
>>>> trade-off between simplicity
>>>> and formality. A basic similarity relation is perfect, and then those
>>>> triples
>>>> can be worked out automatically,
>>>> by means of appropriate metamodels, e.g. as proposed in [3].
>>>>
>>>> Aldo
>>>>
>>>> [1] Bernard Vatant suggested some good practice of mutual linking:
>>>>
>>>>
>> http://universimmedia.blogspot.com/2007/07/using-owlsameas-in-linked-data.htm
>> >>
>> l
>>
>>>> [2] Cyganiak quote:
>>>>
>>>>> People who want to re-use your data will learn to work around its quirks
>>>>> and
>>>>> idiosyncrasies.
>>>>> Dealing with the quirks is a part of re-using data, it always was, and it
>>>>> always will be.
>>>>>
>>>>>
>>>> [3] MailScanner has detected definite fraud in the website at
>>>> "www.ibiblio.org". Do not trust this website:
>>>> http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf
>>>> <http://www.ibiblio.org/hhalpin/irw2006/vpresutti.pdf>  from IRW workshop:
>>>> MailScanner has detected definite fraud in the website at
>>>> "www.ibiblio.org".
>>>> Do not trust this website: http://www.ibiblio.org/hhalpin/irw2006/
>>>> <http://www.ibiblio.org/hhalpin/irw2006/>
>>>>
>>>>
>>>> _________________________________
>>>>
>>>> Aldo Gangemi
>>>>
>>>> Senior Researcher
>>>> Laboratory for Applied Ontology
>>>> Institute for Cognitive Sciences and Technology
>>>> National Research Council (ISTC-CNR)
>>>> Via Nomentana 56, 00161, Roma, Italy
>>>> Tel: +390644161535
>>>> Fax: +390644161513
>>>> aldo.gangemi@cnr.it
>>>>
>>>> http://www.loa-cnr.it/gangemi.html
>>>>
>>>> icq# 108370336
>>>>
>>>> skype aldogangemi
>>>>
>>>>
>>>>
>>>>
>>>>
>>
>>
>>
>>
>>
>
> --
>
> *Bernard Vatant
> *Knowledge Engineering
> ----------------------------------------------------
> *Mondeca**
> *3, cité Nollez 75018 Paris France
> Web:    www.mondeca.com <http://www.mondeca.com>
> ----------------------------------------------------
> Tel:       +33 (0) 971 488 459
> Mail:     bernard.vatant@mondeca.com <mailto:bernard.vatant@mondeca.com>
> Blog:    Leçons de Choses <http://mondeca.wordpress.com/>
>
>
>
>
Received on Thursday, 15 May 2008 23:03:39 UTC