W3C home > Mailing lists > Public > public-semweb-lifesci@w3.org > January 2013

Re: Tools to integrate (hundreds) of spreadsheets as RDF

From: Jim McCusker <mccusj@rpi.edu>
Date: Mon, 21 Jan 2013 10:06:43 -0500
Message-ID: <CAAtgn=SpLhkyvydua5J9-Gt_ErHms5q6BEiqd3=HB2jmdqhAWw@mail.gmail.com>
To: Rafael Richards <rafaelrichards@jhu.edu>
Cc: "Peter.Hendler@kp.org" <Peter.Hendler@kp.org>, "<eric@w3.org>" <eric@w3.org>, "<public-semweb-lifesci@w3.org>" <public-semweb-lifesci@w3.org>
Tim Lebo's csv2rdf4lod is designed for repeatability and scalability. It
was developed to handle transforming the data from data.gov into RDF, and
has been set up to automatically convert thousands of datasets. We even
have one project that regularly updates conversion configurations from
github and be converted and loaded into a triple store automatically via
cron. Further, it supports dataset versioning, where you can keep multiple
versions of data around without URI collisions. It also supports re-using
conversion configurations for multiple files that share a common format.

https://github.com/timrdf/csv2rdf4lod-automation/wiki

Jim


On Sun, Jan 20, 2013 at 7:38 PM, Rafael Richards <rafaelrichards@jhu.edu>wrote:

>  I am also interested in integrating healthcare data published by the
> CDC.  Unfortunately, it comes as nearly 200 separate spreadsheets:
>
>  http://www.cdc.gov/nchs/hus/contents2011.htm#chartbookfigures
>
>  The only thing I am aware of that is designed to keep large numbers
> (potentially hundreds) of spreadsheets continuously integrated and in sync
> across an enterprise, each independently curated,  is Anzo by Cambridge
> Semantics.   Most of the other tools I am aware of do not do real-time
> updating of the RDF model from the CSV model, and are one-off conversions,
> so if you have more than one spreadsheet to update, it will be time
> consuming.
>
>  For one-off conversion Google Refine is quite easy to get started.  It
> has a great deal of data cleaning facilities for noisy or illogical data.
>  With its RDF extension you have *automated* data reconciliation  with
> outside linked data sources of your choice as DBpedia. This is  a feature I
> have not seen with any other conversion tool.    It does not do
> visualization, but there are plenty of desktop applications that do this
> very well.
>
>  Any other suggestions for any other 'pipeline' tools to keep CSV and RDF
> in sync which are (1) currently maintained and (2) have sufficient
> documentation and examples of importing and converting CSV to RDF?
>
>  Rafael
>
>
>
>  On Jan 20, 2013, at 12:57 PM, Peter.Hendler@kp.org wrote:
>
> What are some recommended simple "probably stand alone or work on one
> machine" utilities for converting spreadsheet data to RDF.  And then once
> that file is on disk, to visualize it as a graph?
> This would be for HL7 and CIMI where we'd be entering "clinical models"
> directly into a spreadsheet, and then want to compare models made by
> different people.
>
> <Mail Attachment.jpeg>
>
>
>
> *NOTICE TO RECIPIENT:*  If you are not the intended recipient of this
> e-mail, you are prohibited from sharing, copying, or otherwise using or
> disclosing its contents.  If you have received this e-mail in error, please
> notify the sender immediately by reply e-mail and permanently delete this
> e-mail and any attachments without reading, forwarding or saving them.
>  Thank you.
>
>
>
>


-- 
Jim McCusker
Programmer Analyst
Krauthammer Lab, Pathology Informatics
Yale School of Medicine
james.mccusker@yale.edu | (203) 785-4436
http://krauthammerlab.med.yale.edu

PhD Student
Tetherless World Constellation
Rensselaer Polytechnic Institute
mccusj@cs.rpi.edu
http://tw.rpi.edu
Received on Monday, 21 January 2013 15:07:36 GMT

This archive was generated by hypermail 2.3.1 : Tuesday, 26 March 2013 18:01:17 GMT