W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > September 2011

RE: How much does data integration cost ?

From: Hau, Dave (NIH/NCI) [E] <haudt@mail.nih.gov>
Date: Fri, 16 Sep 2011 12:36:21 -0400
To: Michael Miller <Michael.Miller@systemsbiology.org>, "Mork, Peter D.S." <pmork@mitre.org>, HCLS IG <public-semweb-lifesci@w3.org>
Message-ID: <68706CA218A5B541819F4467C48537EA0F3F00DA09@NIHMLBXBB02.nih.gov>
Agree.  And if each data source owner can provide their mapping to the standard target, e.g. using Bioportal's mapping functionality and the skos mapping relations (e.g. closeMatch, exactMatch), that'd be great, as the owner would understand their data source the best.

The above would pretty much take care of steps 1-4 in the article (Gather knowledge about sources, Gather knowledge about desired consumer (target) view(s), Identify semantic correspondences among sources and from sources to the consumer views, Create needed attribute transformations).

Steps 5,6,8 (Specify data combination rules, Create logical mappings from sources to consumer, Create and optimize an executable connection for the specific run-time environment) can be done maybe with standard semantic web ways like SPARQL?

Step 7 (Data cleaning) could be aided by reasoning-based logical consistency checking.

It'd be good to hear on this topic from the Bioportal and the Health Ontology Mapper teams.

Cheers,
Dave



-----Original Message-----
From: Michael Miller [mailto:Michael.Miller@systemsbiology.org] 
Sent: Friday, September 16, 2011 11:17 AM
To: Mork, Peter D.S.; HCLS IG
Subject: RE: How much does data integration cost ?

hi all,

peter, nice article, it matches well my experience.

one thing to note is that, in this context, that the mapping is
(hopefully) a one shot deal that then can be used into the future without
much change, e.g. the bio* efforts that map to sequence database records.
also if one has a standard target that everything is mapped to, this also
helps.  my experience was mapping third party gene expression experiments
(data and annotation) to MAGE-ML.  then there was a standard mapping that
didn't have to change from MAGE-ML to our Rosetta Resolver application
which provided the UI.

cheers,
michael

> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-
> lifesci-request@w3.org] On Behalf Of Mork, Peter D.S.
> Sent: Wednesday, September 14, 2011 9:29 AM
> To: HCLS IG
> Subject: RE: How much does data integration cost ?
>
> This article
> (http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.6098&rep=re
> p1&type=pdf) doesn't give absolute numbers, but it does describe what
> portions of a data integration task eat up the most time.
>
> Peter Mork
>
>
> -----Original Message-----
> From: public-semweb-lifesci-request@w3.org [mailto:public-semweb-
> lifesci-request@w3.org] On Behalf Of Andrea Splendiani
> Sent: Wednesday, September 14, 2011 12:25 PM
> To: HCLS IG
> Subject: How much does data integration cost ?
>
> Hi,
>
> I was wondering if anybody on this list has some figures on how much
> time/resources are spent in data integration, as a percentage of the
> overall
> 'task' performed.
> I often got the impression that 'data integration' is an obscure entity
> for
> many final users. For instance people concerned about getting results
> out of
> data usually only refer to the overall process as 'analysis', and often
> data
> integration is an ill defined entity shadowed by a better defined
> statistical analysis.
> I know this varies across organizations/tasks and that the distinction
> between 'data integration' and the rest is a bit fuzzy, however, in a
> first
> approximation, which is the size of the problem that the Semantic Web
> is
> trying to tackle ?
> Obviously, I would be interested in the Life Sciences and Health Care
> context.
>
> best,
> Andrea Splendiani
>
>
> Andrea Splendiani
> Senior Bioinformatics Scientist
> Centre for Mathematical and Computational Biology
> +44(0)1582 763133 ext 2004
> andrea.splendiani@bbsrc.ac.uk
>
>
>
>
>
Received on Friday, 16 September 2011 16:36:55 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:21:00 UTC