Re: Managing Co-reference (Was: A Semantic Elephant?) from Yves Raimond on 2008-05-15 (semantic-web@w3.org from May 2008)

From: Yves Raimond <yves.raimond@gmail.com>
Date: Thu, 15 May 2008 14:45:21 +0100
To: "Bernard Vatant" <bernard.vatant@mondeca.com>
Cc: semantic-web@w3.org
Message-ID: <82593ac00805150645r599d4edfl63be96125501c890@mail.gmail.com>

Hello!

>> Sorry to jump in the middle of this discussion, but I don't
>> particularly agree with that. They are plenty of cases where they
>> can't really be avoided, even in LOD large projects.
>> For example, http://dbtune.org/jamendo/artist/5 and
>> http://zitgist.com/music/artist/0781a3f3-645c-45d1-a84f-76b4e4decf6d
>> identify the same artist. One of them in the Jamendo database, and one
>> of them in Musicbrainz.
>>
>
> See my previous message. It all depends on what you mean by "the same
> artist" ...

I tend to avoid this sort of issues - the sameAs link is just useful
as a data provider (I indeed want to merge statements about both
resources). But it is true that given only representations, it is hard
to state that two resources are the same (same thing in real life - in
what an Elvis alike is different from Elvis, if I just see him? :-) ).
owl:sameAs serves a practical need and of course will never be 100%
accurate.


>>
>> Both databases hold *really* different type of information about these
>> artists. Musicbrainz holds detailed editorial information (regardless
>> of their publication in the Jamendo Creative Commons platform),
>> information about the members of this band and their birth dates, etc.
>> Jamendo holds actual audio items, and also a set of tags for each of them.
>>
>
> In this case, maybe you have the chance to have completely orthogonal
> descriptions, IOW different "facets".
> Unclear
>>
>> As an URI is not only an identifier but also a way to access a
>> specific representation, how could I use a single URI in this case? In
>> other words, how would I avoid the owl:sameAs between the two?
>>
>
> This is the core issue. From a purely declarative viewpoint, owl:sameAs
> entails that representations are merged and you can't separate any more what
> is coming from where. It's a semantic equivalent of thermodynamics 2nd
> principle. I'm not sure that "a URI is not only an identifier but also a way
> to access a specific representation" is an affirmation which holds in a
> linked data universe.

I definitely think it is. http://dbtune.org/jamendo/artist/5 is not
only an identifier (like 0781a3f3-645c-45d1-a84f-76b4e4decf6d would be
in Musicbrainz). Using it, I can curl -L -H "..." it to access some
RDF, coming from Jamendo through DBTune in that particular case.

Moreover, the fact that "representations are merged" is precisely what
I am looking for as a data publisher. Most semweb agents interpret
owl:sameAs that way, and that's exactly how I want them to behave in
that case. Then, they can still keep track of the source of the data
and allow the user to dismiss one or the other.


>>
>> Different data sources make different claims about similar thing, and
>> we need both a way to access these claims and to keep the cross-source
>> identity. I think owl:sameAs is quite a nice way of doing that.
>>
>
> You are no more native speaker than I am :)
> But you seem to use "same" and "similar" indifferently. My point is that
> "same" and "similar" are similar, but not the same.

Heh :-) But see my point above, I think it all depends on how you want
your data to be interpreted.


Cheers!
y

Received on Thursday, 15 May 2008 13:46:00 UTC