Re: Paper: URI Identity Management for Semantic Web Data Integration and Linkage from Andreas Harth on 2007-08-01 (public-semweb-lifesci@w3.org from August 2007)

From: Andreas Harth <andreas.harth@deri.org>
Date: Wed, 01 Aug 2007 19:28:04 +0100
To: Eric Neumann <eneumann@teranode.com>
CC: public-semweb-lifesci@w3.org
Message-ID: <46B0D0B4.8010008@deri.org>

Hi Eric,

you might find [1] interesting in relation to the one thing/multiple URI
problem you're mentioning.

We distinguish four approaches to chosing a URI over multiple
alternatives in a Web data aggregation scenario:

- random or lexical ordering
- most agreed upon across data sources
- count of occurrence
- links analysis

Regards,
Andreas.

[1] http://sw.deri.org/2007/02/objcon/paper.pdf

Eric Neumann wrote:
>
> A recent paper has been published that addresses some of the same URI
> issues we've been discussing:
>
> "URI Identity Management for Semantic Web Data Integration and Linkage"
> http://eprints.ecs.soton.ac.uk/14361/
>
> They focus on Coreference aka Record Linkages, which deals with the
> problem of having more than one URI for the "same thing". There's a
> probabilistic theory behind it, but it offers an approach different
> from using "owl:sameAs' between possible similar things.
>
> My take-- coming from a life sciences perspective-- is that we still
> need to standardize URI's to specific datarecords (e.g., uniprot,
> entrez, ensemble, etc) as well as concepts, but when we need to
> "cross-bundle" different records that are supposedly referring to the
> same bio-entity (uniprot/p12345 ~ entrez/g6422 ~ ensmeble/s47721),
> then this approach may be worth considering.
>
> I leave it to the group to discuss the possible value of this paper to
> our ongoing URI activity...
>
> Eric
>

Received on Wednesday, 1 August 2007 18:29:09 UTC