W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > January 2013

RE: Tools to integrate (hundreds) of spreadsheets as RDF

From: Erich Gombocz <egombocz@io-informatics.com>
Date: Mon, 21 Jan 2013 09:54:15 -0800
Message-ID: <08AE3015BD5DF149951910A5E851F010F57970@MAIL-02.io-informatics.com>
To: "Rafael Richards" <rafaelrichards@jhu.edu>, <Peter.Hendler@kp.org>
Cc: <eric@w3.org>, <public-semweb-lifesci@w3.org>
Dear Rafael and All,

 

IO Informatics' Knowledge Explorer. Professional Edition, also provides
an automated way to facilitate import and updating a triplestore backend
of your choice via monitored folders which will map and import incoming
spreadsheets to RDF. You can set up multiple monitored folders with
different data mappings, and this will run as background processes to
continuously update one or multiple connected triplestores (or different
graphs in a single triplestore.

 

The Knowledge Explorer also provide scripting within the import mapping,
application of thesauri and other mechanisms for data transformation to
clean, consolidate and harmonize data during the import.

 

You can find out more about this tool here:
http://www.io-informatics.com/products/sentient-KE.html

 

Cordially,

 

Erich Gombocz

 

From: Rafael Richards [mailto:rafaelrichards@jhu.edu] 
Sent: Sunday, January 20, 2013 4:38 PM
To: Peter.Hendler@kp.org
Cc: <eric@w3.org>; <public-semweb-lifesci@w3.org>
Subject: Tools to integrate (hundreds) of spreadsheets as RDF

 

I am also interested in integrating healthcare data published by the
CDC.  Unfortunately, it comes as nearly 200 separate spreadsheets:

 

http://www.cdc.gov/nchs/hus/contents2011.htm#chartbookfigures

 

The only thing I am aware of that is designed to keep large numbers
(potentially hundreds) of spreadsheets continuously integrated and in
sync across an enterprise, each independently curated,  is Anzo by
Cambridge Semantics.   Most of the other tools I am aware of do not do
real-time updating of the RDF model from the CSV model, and are one-off
conversions, so if you have more than one spreadsheet to update, it will
be time consuming.

 

For one-off conversion Google Refine is quite easy to get started.  It
has a great deal of data cleaning facilities for noisy or illogical
data.  With its RDF extension you have *automated* data reconciliation
with outside linked data sources of your choice as DBpedia. This is  a
feature I have not seen with any other conversion tool.    It does not
do visualization, but there are plenty of desktop applications that do
this very well. 

 

Any other suggestions for any other 'pipeline' tools to keep CSV and RDF
in sync which are (1) currently maintained and (2) have sufficient
documentation and examples of importing and converting CSV to RDF?

 

Rafael

 

 

 

On Jan 20, 2013, at 12:57 PM, Peter.Hendler@kp.org wrote:





What are some recommended simple "probably stand alone or work on one
machine" utilities for converting spreadsheet data to RDF.  And then
once that file is on disk, to visualize it as a graph? 
This would be for HL7 and CIMI where we'd be entering "clinical models"
directly into a spreadsheet, and then want to compare models made by
different people. 

<Mail Attachment.jpeg>



NOTICE TO RECIPIENT:  If you are not the intended recipient of this
e-mail, you are prohibited from sharing, copying, or otherwise using or
disclosing its contents.  If you have received this e-mail in error,
please notify the sender immediately by reply e-mail and permanently
delete this e-mail and any attachments without reading, forwarding or
saving them.  Thank you.



 
Received on Monday, 21 January 2013 17:54:41 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:17 GMT