Scraping RDF data from spreadsheets, etc

I've been tinkering with some thoughts for scraping RDF data from spreadsheets.
More precisely, from CSV data exported from a spreadsheet program.
Similar techniques might also be usable when exporting database-hosted 
information.

For certain kinds of regular data, I think a spreadsheet or tabular form 
might be a more convenient way of gathering the information than any of the 
"native" RDF formats.  Nothing here is especially exciting, just another 
way for RDF processors to get at data.  I include my test data below, for 
which I have added a reader to my Swish program [1].

Roughly, I see each row corresponding to a resource, and each column in a 
table corresponding to a property.  A column may be designated as providing 
an resource identifier for each row, otherwise blank nodes allocated as needed.

The property URIs and type information can be provided as annotations in 
the original spreadsheet, or as a separate text file that prefixes the 
exported CSV data.  For now, I assume a single file.

#g
--

[1] http://www.ninebynine.org/RDFNotes/Swish/Intro.html
(The new functionality is not yet exposed in the published version.  But it 
will be.  Watch this space!)

...

CSV data, as exported from a Microsoft Excel spreadsheet:
[[
@prefix,ex:,<http://id.ninebynine.org/wip/2003/swishtest/>,,,,

@rowtype,ex:RowType,,,,,

@columns,,,,,,
ex:col1,ex:col2,@about,ex:col3,ex:col4,,
@coltypes,,,,,,
,xsd:integer,@resource,@resource,@string,,

@data,,,,,,
"row1,col1",12,ex:row1,ex:row1col3,row1col4,,named subject
"row2,col1",22,,ex:row2col3,row2col4,,blank node subject
"row3,col1",,ex:row3,<http://id.ninebynine.org/wip/2003/swishtest/row3col3>,,,full 
URI for resource
,,ex:row4,,,,no properties -- no resource
"row5,col1",,ex:row5,,,,just 1 property
"row6,col1",,ex:row6,,,,
"row7,col1",,ex:***,,,,syntax error in subject name
"row8,col1",,ex:row8,ex:***,,,syntax error in object qname
"row9,col1",,ex:row9,<http:a%z>,,,syntax error in object uri
,,,,,,blank row
@end,,,,,,end of data
foo,,,,,,ignore this
]]

Corresponding graph in Notation3:
[[
# TestCSVtoRDF.n3
#
# Should be isomorphic to TestCSVtoRDF.CSV, when read by Swish

@prefix ex: <http://id.ninebynine.org/wip/2003/swishtest/> .

# "row1,col1",12,ex:row1,ex:row1col3,row1col4,,named subject
ex:row1 a ex:RowType ;
   ex:col1 "row1,col1" ;
   ex:col2 "12"^^xsd:integer ;
   ex:col3 ex:row1col3 ;
   ex:col4 "row1col4" .

# "row2,col1",22,,ex:row2col3,row2col4,,blank node subject
[ a ex:RowType ;
   ex:col1 "row2,col1" ;
   ex:col2 "22"^^xsd:integer ;
   ex:col3 ex:row2col3 ;
   ex:col4 "row2col4" ] .

# 
"row3,col1",,ex:row3,<http://id.ninebynine.org/wip/2003/swishtest/row3col3>,,,full 
URI for resource
ex:row3 a ex:RowType ;
   ex:col1 "row3,col1" ;
   ex:col3 <http://id.ninebynine.org/wip/2003/swishtest/row3col3> .

# ,,ex:row4,,,,no properties -- no resource

# "row5,col1",,ex:row5,,,,just 1 property
ex:row5 a ex:RowType ;
   ex:col1 "row5,col1" .

# "row6,col1",,ex:row6,,,,
ex:row6 a ex:RowType ;
   ex:col1 "row6,col1" .

# "row7,col1",,ex:***,,,,syntax error in subject name
[ a ex:RowType ;
   ex:col1 "row7,col1" ] .

# "row8,col1",,ex:row8,ex:***,,,syntax error in object qname
ex:row8 a ex:RowType ;
   ex:col1 "row8,col1" .

# "row9,col1",,ex:row9,<http:a%z>,,,syntax error in object uri
ex:row9 a ex:RowType ;
   ex:col1 "row9,col1" .
]]


------------
Graham Klyne
For email:
http://www.ninebynine.org/#Contact

Received on Wednesday, 10 March 2004 14:40:31 UTC