W3C home > Mailing lists > Public > public-lod@w3.org > July 2009

Re: Merging Databases

From: Hugh Glaser <hg@ecs.soton.ac.uk>
Date: Mon, 20 Jul 2009 20:11:49 +0100
To: Alan Ruttenberg <alanruttenberg@gmail.com>
CC: Amrapali Zaveri <amrapali.zaveri@gmail.com>, "public-lod@w3.org" <public-lod@w3.org>, Anja Jentzsch <anja@anjeve.de>, Susie Stephens <susie.stephens@gmail.com>
Message-ID: <EMEW3|60c7416067fbffa04433c9296bf0f308l6JKC102hg|ecs.soton.ac.uk|C54E%hg@ecs.soton.ac.uk>
Excellent Alan, thank you for pointing this out, both as a general point and the specific case.
I had puzzled over the equivalences, since they seemed implausible (just given the number), but not worked out why (I guess the lack of the backlink did not help).
However they, along with good stuff, came from a source I chose to trust, and so I went with it.
The caveat is useful in general - I know of other sameas that I might consider erroneous/dubious but are widely accepted.
Another example would be the sameas between the opencyc URIs and the dbpedia ones, although things of different type, which was a topic of discussion on this list a while ago.
The sameas.org service is more liberal than the one we use for rkbexplorer, as it is to help people find things, although I do consider some sources too error-prone to use.
Not sure how to describe such policies, and certainly not in RDF, as expected for a SW service.

On 20/07/2009 13:40, "Alan Ruttenberg" <alanruttenberg@gmail.com> wrote:

On Mon, Jul 20, 2009 at 7:14 AM, Hugh Glaser<hg@ecs.soton.ac.uk> wrote:
> And just in case you haven't found it, a load of these hard-won equivalences are collected together at sameas.org, such as
> http://sameas.org/?uri=http://data.linkedct.org/resource/intervention/51572

Caveat emptor, some of these hard won equivalences will be hard
losses. The sameAs assertions are incorrect. They equate a description
of the values of an independent variable in a clinical study to one of
the drugs administered in the intervention, the drug Ramelteon.

"Subjects demonstrating low sleep efficiencies and prolonged sleep
latencies, will be randomly assigned to continue to receive SHI
accompanied by either placebo or Ramelteon (8 mg). Matching placebo
will be obtained and the medication pre-packaged and ordered based on
the randomization results"

It is not straightforward to figure this out, either - there is no
obvious backlink that leads you back from
http://data.linkedct.org/resource/intervention/51572 to the source of
the information http://clinicaltrials.gov/ct2/show/NCT00576927
where the quote used as the value of linkedct:description is found.

While a person browsing this will be able to disambiguate, if you
depend on these equivalences for any sort of reasoning you will land
up dubious conclusions.


> Hugh
> On 20/07/2009 05:03, "Oktie Hassanzadeh" <oktie@cs.toronto.edu> wrote:
> On Sun, Jul 19, 2009 at 9:00 PM, Amrapali Zaveri <amrapali.zaveri@gmail.com> wrote:
> Hi all,
> I am attempting to merge 3 databases: (i) Clinicaltrials.gov <http://clinicaltrials.gov/>  , (ii) Geonames <http://www.geonames.org/>  , (iii) FDA <http://www.fda.gov/Drugs/InformationOnDrugs/ucm135162.htm>  based on ontologies.
> There are RDF Triples already defined for (i) http://linkedct.org/index.html and there is already an ontology present for (ii) http://www.geonames.org/ontology/ . However, there is no ontology present for    the FDA database. The field "Zip Code" is common for all the three databases.
> If the FDA datasets are not published as RDF yet, we can certainly take the lead in publishing them as a part of the Linking Open Drug Data [1] project.
> [1] http://esw.w3.org/topic/HCLSIG/LODD
> LinkedCT already provides links to Geonames, but please let me know if you see any missing links.
> Regards,
> Oktie
> Could anyone suggest possibilities of how to merge the three databases, based on ontologies?
> Thanks,
> Regards,
> Amrapali J Zaveri
Received on Monday, 20 July 2009 19:12:48 UTC

This archive was generated by hypermail 2.4.0 : Thursday, 24 March 2022 20:29:44 UTC