RE: [BIORDF] edit of Top Level Task - scalability from Seaborne, Andy on 2006-03-27 (public-semweb-lifesci@w3.org from March 2006)

From: Seaborne, Andy <andy.seaborne@hp.com>
Date: Mon, 27 Mar 2006 18:14:06 +0100
To: "Kashyap, Vipul" <VKASHYAP1@PARTNERS.ORG>, "M. Scott Marshall" <marshall@science.uva.nl>, <public-semweb-lifesci@w3.org>
Message-ID: <DF5E364A470421429AE6DC96979A4F6F8F8C0D@sdcexc04.emea.cpqcorp.net>
-------- Original Message --------
> From: Kashyap, Vipul <>
> Date: 27 March 2006 15:57
> 
> +1
> 
> RDF wrappers and interfaces are the best way to go.
> 
> As we deploy this technology into the real world, IT folks will be
> against moving and re-designing data repositories, but will be more
> amenable 
> to any approach that doesn't require them to relocate, redesign or
> disrupt 
> current applications to the data.
> 
> Also, creating an RDF data warehouse essentially destroys the
> "incremental low 
> cost" value proposition that Eric Miller talks about.
> 
> Would propose that the BIORDF group explore best practices for
creating
> RDF 
> wrappers from a variety of data sources....

It would be interesting to hear what issues arise when wrappers are
applied in real situations.  Such real-use experience from BioRDF could
help guide the choices for others communities.

An example: how much effort does it take wrap the database?  What
tradeoff's are there in the return-on-investment?

D2RQ creates 'data objects' via PropertyBridges to RDF/OWL classes from
the SQL data; that can reflect the design structure of the relational
database. A person, skilled in both data modelling on the semantic web
and in the SQL domain, creates an explicit mapping between the RDF data
model and the SQL schema.  The RDF provided can be high quality and most
useful.  There is a cost - the mapping needs to be written and
maintained.  That may or may not be a problem depending on people
available and the number of databases to be done.

An alternative approach is to map tables fairly directly into RDF.  This
process can be semi-automated so it's lower cost but the data modelling
is at a lower level (maybe just properties for columns) and does not
explicitly capture the full set of relationships in the SQL data.
Anything higher level (anything that spans tables, for example) isn't
explicitly there.  It should be possible to put back that level with a
layer of RDF->RDF mapping (rules? OWL?), and that can be created and
applied by a different people and different tools for different needs.

Having been involved in different small-scale experiments that take each
of these approaches, I can say it isn't the case that one approach is
always better than the other.

	Andy 

> 
> Cheers,
> 
> ---Vipul
> 
> =======================================
> Vipul Kashyap, Ph.D.
> Senior Medical Informatician
> Clinical Informatics R&D, Partners HealthCare System
> Phone: (781)416-9254
> Cell: (617)943-7120
> http://www.partners.org/cird/AboutUs.asp?cBox=Staff&stAb=vik
> 
> To keep up you need the right answers; to get ahead you need the right
> questions 
> ---John Browning and Spencer Reiss, Wired 6.04.95
> > -----Original Message-----
> > From: public-semweb-lifesci-request@w3.org
> > [mailto:public-semweb-lifesci- request@w3.org] On Behalf Of M. Scott
> > Marshall 
> > Sent: Monday, March 27, 2006 9:50 AM
> > To: public-semweb-lifesci@w3.org
> > Subject: [BIORDF] edit of Top Level Task - scalability
> > 
> > 
> > After e-mail with Susie, I have edited the BioRDF Top Level Task [1]
> > to reflect some of the scalability issues.
> > 
> > Some of my comments to Susie were:
> > > I can imagine 'collecting' data into an RDF repository for a demo
> > > but we should keep in mind that this approach won't scale.
Example:
> > > One of the data files that we imported was 53Mb. Once transformed
> > > into RDF, it has become ~800Mb. Obviously, this is survivable for
> > > reasonably small datasets, but.. 
> > > 
> > > That's why HCLSIG should hope to eventually have RDF export
> > > functionality "on demand" at the data source (instigate widespread
> > > adoption of SW values by omics database managers?). But, lately, I
> > > think that in the long run, rather than convert legacy databases
> > > into RDF repositories or export from them, that query
> > > mapping/rewriting approaches such as D2RQ[2] could be more
> > > effective. Also, federation/p2p/broker approaches could help to
> > > consolidate biobase interfaces for the user. Does this say
anything
> > > to you? 
> > 
> > -scott
> > 
> > [1] http://esw.w3.org/topic/BioRDF_Top_Level_Task
> > [2] http://www.wiwiss.fu-berlin.de/suhl/bizer/d2rq/ --
> > M. Scott Marshall
> > tel. +31 (0) 20 525 7765
> > http://staff.science.uva.nl/~marshall
> > http://integrativebioinformatics.nl/
> > Integrative Bioinformatics Unit, University of Amsterdam
Received on Monday, 27 March 2006 17:14:10 UTC