Direct mapping for spreadsheets

Hey all,

I was looking into RDFizing of spreadsheets (GRDDL on Microsoft's
SpreadsheetML, more specifically).

I know there are multiple tools, products, and mappings (such as
XLWrap, TopBraid Composer, Google Refine etc).
However I need a generic mapping, and none of them seem to do the job.
I just need to lift the spreadsheet data to the RDF level, and from
there I will be able to map it to a higher-level vocabulary using
SPARQL CONSTRUCT queries (with or without user assistance).

The closest thing to what I'm thinking about is the RDB to RDF direct
mapping [1].
Obviously spreadsheets do not have primary keys, column names and
datatypes. They seem however to be a more general case than the
relational, and they still have table (worksheet) names as well
row/column indices.

Trying to follow the R2R mapping, I came up with this basic example:

  @base <http://foo.example/spreadsheet.xlsx> .

  <#sheet1/1> <#sheet1/A> "content of A1" .
  <#sheet1/1> <#sheet1/B> "content of B1" .
  <#sheet1/2> <#sheet1/A> "content of A2" .
  <#sheet1/2> <#sheet1/B> "content of B2" .

This has an issue with addressing resources within packages, which has
been widely discussed [2] but not solved, AFAIK.

Has something like this been already attempted? I don't want to
reinvent the wheel.

[1] http://www.w3.org/TR/rdb-direct-mapping/
[2] http://lists.w3.org/Archives/Public/www-tag/2008Oct/0126.html

Martynas
graphity.org

Received on Friday, 27 July 2012 15:25:40 UTC