W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > January 2013

Re: Tools to integrate (hundreds) of spreadsheets as RDF

From: rafael richards <rmr.jhu@gmail.com>
Date: Mon, 21 Jan 2013 12:37:37 -0500
Message-ID: <50FD7CE1.1030601@gmail.com>
To: "Peter.Hendler@kp.org" <Peter.Hendler@kp.org>, eric@w3.org, public-semweb-lifesci@w3.org, public-semweb-lifesci@w3.org
Very welcome.  Just be sure to install the RDF extension to Google Refine:

http://refine.deri.ie/

This will give you the capability to reconcile and interlink  your 
spreadsheet data against external SPARQL endpoints or RDF dumps, to 
search the web for related RDF datasets, and export your data as RDF.  
On export you can define the shape of the RDF graph using your own 
vocabulary or import existing ones.

/Rafael/


On 1/21/13 12:18 PM, Peter.Hendler@kp.org wrote:
> Thanks I'll check out the Google one.  For my current needs on off is 
> good enough.
>
>
>
>
>
> *NOTICE TO RECIPIENT:* If you are not the intended recipient of this 
> e-mail, you are prohibited from sharing, copying, or otherwise using 
> or disclosing its contents.  If you have received this e-mail in 
> error, please notify the sender immediately by reply e-mail and 
> permanently delete this e-mail and any attachments without reading, 
> forwarding or saving them.  Thank you.
>
>
>
>
>
>
> From: Rafael Richards <rafaelrichards@jhu.edu>
> To: Peter Hendler/CA/KAIPERM@KAIPERM
> Cc: "<eric@w3.org>" <eric@w3.org>, "<public-semweb-lifesci@w3.org>"   
>  <public-semweb-lifesci@w3.org>
> Date: 01/20/2013 04:38 PM
> Subject: Tools to integrate (hundreds) of spreadsheets as RDF
> ------------------------------------------------------------------------
>
>
>
> I am also interested in integrating healthcare data published by the 
> CDC.  Unfortunately, it comes as nearly 200 separate spreadsheets:
>
> _http://www.cdc.gov/nchs/hus/contents2011.htm#chartbookfigures_
>
> The only thing I am aware of that is designed to keep large numbers 
> (potentially hundreds) of spreadsheets continuously integrated and in 
> sync across an enterprise, each independently curated,  is Anzo by 
> Cambridge Semantics.   Most of the other tools I am aware of do not do 
> real-time updating of the RDF model from the CSV model, and are 
> one-off conversions, so if you have more than one spreadsheet to 
> update, it will be time consuming.
>
> For one-off conversion Google Refine is quite easy to get started.  It 
> has a great deal of data cleaning facilities for noisy or illogical 
> data.  With its RDF extension you have *automated* data reconciliation 
>  with outside linked data sources of your choice as DBpedia. This is 
>  a feature I have not seen with any other conversion tool.    It does 
> not do visualization, but there are plenty of desktop applications 
> that do this very well.
>
> Any other suggestions for any other 'pipeline' tools to keep CSV and 
> RDF in sync which are (1) currently maintained and (2) have sufficient 
> documentation and examples of importing and converting CSV to RDF?
>
> Rafael
>
>
>
> On Jan 20, 2013, at 12:57 PM, _Peter.Hendler@kp.org_ 
> <mailto:Peter.Hendler@kp.org>wrote:
>
> What are some recommended simple "probably stand alone or work on one 
> machine" utilities for converting spreadsheet data to RDF.  And then 
> once that file is on disk, to visualize it as a graph?
> This would be for HL7 and CIMI where we'd be entering "clinical 
> models" directly into a spreadsheet, and then want to compare models 
> made by different people.
>
> <Mail Attachment.jpeg>
>
>
> *
> NOTICE TO RECIPIENT:* If you are not the intended recipient of this 
> e-mail, you are prohibited from sharing, copying, or otherwise using 
> or disclosing its contents.  If you have received this e-mail in 
> error, please notify the sender immediately by reply e-mail and 
> permanently delete this e-mail and any attachments without reading, 
> forwarding or saving them.  Thank you.
>
>
>
>
Received on Monday, 21 January 2013 22:16:18 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:17 GMT