W3C home > Mailing lists > Public > public-lod@w3.org > July 2009

Re: Merging Databases

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 20 Jul 2009 20:35:53 +0100
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: Amrapali Zaveri <amrapali.zaveri@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, Susie Stephens <susie.stephens@gmail.com>
Message-ID: <EMEW3|feef9aef993fd294f9fa046885537f6dl6JKa602hg|ecs.soton.ac.uk|C553%hg@ecs.soton.ac.uk>
But people were beating me up for years for not using sameAs, and when I finally crack you say that to me...
:-D

Actually, a really serious issue here is that the sameAs assertions are included in the same KB that has the substantive data.
So if you resolve a URI and cache the RDF you get, you end up with the sameAs in your cache - it is very hard to just get the data without jeopardising any reasoning you may subsequently make.
(Well you could throw away all the owl:sameAs, skos:exactMatch, not to mention skos:closeMatch, owl:equivalentClass, etc., but then of course you might be throwing away some important internal stuff.)
So I have been saying keep the sameAs knowledge separate for quite a while, but don't seem to be winning that one either. :-)

Best
Hugh

On 20/07/2009 20:26, "Alan Ruttenberg" <alanruttenberg@gmail.com> wrote:

On Mon, Jul 20, 2009 at 3:11 PM, Hugh Glaser<hg@ecs.soton.ac.uk> wrote:
> Excellent Alan, thank you for pointing this out, both as a general point and the specific case.
> I had puzzled over the equivalences, since they seemed implausible (just given the number), but not worked out why (I guess the lack of the backlink did not help).
> However they, along with good stuff, came from a source I chose to trust, and so I went with it.
> The caveat is useful in general - I know of other sameas that I might consider erroneous/dubious but are widely accepted.
> Another example would be the sameas between the opencyc URIs and the dbpedia ones, although things of different type, which was a topic of discussion on this list a while ago.
> The sameas.org service is more liberal than the one we use for rkbexplorer, as it is to help people find things, although I do consider some sources too error-prone to use.
> Not sure how to describe such policies, and certainly not in RDF, as expected for a SW service.

I would say: Never assert sameAs. It's just too big a hammer. Instead
use a wider palette of relationships to connect entities to other
ones.

-Alan

> Best
> Hugh
>
> On 20/07/2009 13:40, "Alan Ruttenberg" <alanruttenberg@gmail.com> wrote:
>
> On Mon, Jul 20, 2009 at 7:14 AM, Hugh Glaser<hg@ecs.soton.ac.uk> wrote:
>> And just in case you haven't found it, a load of these hard-won equivalences are collected together at sameas.org, such as
>> http://sameas.org/?uri=http://data.linkedct.org/resource/intervention/51572
>
> Caveat emptor, some of these hard won equivalences will be hard
> losses. The sameAs assertions are incorrect. They equate a description
> of the values of an independent variable in a clinical study to one of
> the drugs administered in the intervention, the drug Ramelteon.
>
> "Subjects demonstrating low sleep efficiencies and prolonged sleep
> latencies, will be randomly assigned to continue to receive SHI
> accompanied by either placebo or Ramelteon (8 mg). Matching placebo
> will be obtained and the medication pre-packaged and ordered based on
> the randomization results"
>
> It is not straightforward to figure this out, either - there is no
> obvious backlink that leads you back from
> http://data.linkedct.org/resource/intervention/51572 to the source of
> the information http://clinicaltrials.gov/ct2/show/NCT00576927
> where the quote used as the value of linkedct:description is found.
>
> While a person browsing this will be able to disambiguate, if you
> depend on these equivalences for any sort of reasoning you will land
> up dubious conclusions.
>
> -Alan
>
>
>
>> Hugh
>>
>> On 20/07/2009 05:03, "Oktie Hassanzadeh" <oktie@cs.toronto.edu> wrote:
>>
>> On Sun, Jul 19, 2009 at 9:00 PM, Amrapali Zaveri <amrapali.zaveri@gmail.com> wrote:
>> Hi all,
>>
>> I am attempting to merge 3 databases: (i) Clinicaltrials.gov <http://clinicaltrials.gov/>  , (ii) Geonames <http://www.geonames.org/>  , (iii) FDA <http://www.fda.gov/Drugs/InformationOnDrugs/ucm135162.htm>  based on ontologies.
>>
>> There are RDF Triples already defined for (i) http://linkedct.org/index.html and there is already an ontology present for (ii) http://www.geonames.org/ontology/ . However, there is no ontology present for    the FDA database. The field "Zip Code" is common for all the three databases.
>>
>>
>> If the FDA datasets are not published as RDF yet, we can certainly take the lead in publishing them as a part of the Linking Open Drug Data [1] project.
>>
>> [1] http://esw.w3.org/topic/HCLSIG/LODD
>>
>> LinkedCT already provides links to Geonames, but please let me know if you see any missing links.
>>
>>
>> Regards,
>> Oktie
>>
>>
>> Could anyone suggest possibilities of how to merge the three databases, based on ontologies?
>>
>> Thanks,
>> Regards,
>> Amrapali J Zaveri
>>
>>
>>
>>
>>
>
>
Received on Monday, 20 July 2009 19:36:54 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:20:51 UTC