Re: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

> YeastHub (http://yeasthub.gersteinlab.org) provides an interactive web 
> interface to allow the user to register and map a web-accessible tabular 
> dataset into RDF/XML. If needed, we can turn this interactive interface 
> into a programmatic one to allow broader use.


As part of a semantic web for data integration experiment(submitted for 
publishing), I adapted/hacked the Mapper program[1] to convert tabular 
data from a web source (ENCODE at UCSC, in csv format) to (our) RDF 
format ala YeastHub. Mapper can read from several common formats and 
database connections and write to the same, with possible 
transformations. The reason that we chose to start with the Mapper code, 
aside from the fact that it already worked on small examples, is that 
the mappings are disclosed in an XML file - as opposed to being coded 
directly into the program. Ideally, the mapping commands would be 
expressed in a standardized "mapping language" (which one? 
suggestions?), preferably expressible in RDF for the sake of uniformity 
and provenance.

It is probably worth noting that what we generally find to be missing is 
information about the _semantic type_ of the data columns (yes, no 
surprise here!). In the best situation, when you've found information 
about the syntactic datatypes, you must still guess the semantic type 
and its relations to other types from the name in the header. Maybe 
someday, data exports will include not only options to export data and 
datatype to RDF but also the relevant semantic data types.

best,
scott

[1]http://www.javaworld.com/javaworld/jw-04-2002/jw-0426-mapper.html


-- 
M. Scott Marshall
tel. +31 (0) 20 525 7765
http://staff.science.uva.nl/~marshall
http://ibu.micro-array.nl/
Integrative Bioinformatics Unit, University of Amsterdam

Received on Monday, 20 February 2006 15:56:18 UTC