RE: [BIORDF] Re: Unstructured vs. Structured (was: HL7 and patient records in RDF/OWL?)

>As part of a semantic web for data integration experiment(submitted
for
>publishing), I adapted/hacked the Mapper program[1] to convert tabular
>data from a web source (ENCODE at UCSC, in csv format) to (our) RDF
>format ala YeastHub. Mapper can read from several common formats and
>database connections and write to the same, with possible
>transformations. The reason that we chose to start with the Mapper
code,
>aside from the fact that it already worked on small examples, is that
>the mappings are disclosed in an XML file - as opposed to being coded
>directly into the program. Ideally, the mapping commands would be
>expressed in a standardized "mapping language" (which one?
>suggestions?), preferably expressible in RDF for the sake of
uniformity
>and provenance.
>

I'm a fan of simplicity.  Towards that end, I advocate separating
syntactic and semantic transformations.  Do whatever is easiest to get
the data into the common representation (RDF in this case, but in
BioMediator we originally chose XML).  Then, write the mapping in a
language appropriate to the representation.

Once you have your RDB/spreadsheet/XML/etc. in RDF, write a semantic
mapping using, for example, SPARQL.  Once you have your
RDB/spreadsheet/ASN.1/etc. in XML, write a semantic mapping using
XQuery.

As a result, all of the semantics are captured in the same mapping
language.  You don't need to have a different 'standard' for each
possible source format.

Peter

Received on Monday, 20 February 2006 16:06:22 UTC