Re: generic XML to RDF triple mapping from Graham Klyne on 2000-09-11 (www-rdf-interest@w3.org from September 2000)

From: Graham Klyne <GK@Dial.pipex.com>
Date: Mon, 11 Sep 2000 09:42:28 +0100
To: James Tauber <JTauber@bowstreet.com>
Cc: "'www-rdf-interest@w3.org'" <www-rdf-interest@w3.org>
Message-Id: <4.3.2.7.2.20000911092620.00b4b790@pop.dial.pipex.com>

At 05:57 AM 9/8/00 -0400, James Tauber wrote:

>I believe that it should be possible to map arbitrary XML into RDF triples.

Accepting that your idea has merit, I'd like to raise an issue of possible 
concern:

XML has a richer lexical structure than RDF, which is significant to XMLs 
heritage in evolving from document markup languages.  To name two:  the 
distinction between elements and attributes, and the significance of 
element order.

IMO, one of RDF's strengths as a _semantic markup_ language is that it 
omits most of that lexical complexity to focus on semantic issues in graph 
form.

My concern is that a mechanism for translating arbitrary XML to RDF would 
have to import the lexical XML structure into the RDF model, even though in 
many cases this would not be semantically significant.  A generic mapping 
could not possibly know what is and is not significant.

Example:

     <invoice number='1234'>
       <customer>...</customer>
       <item number='1'>...<amount>...</amount></item>
       <item number='2'>...<amount>...</amount></item>
       <total>...</total>
     </invoice>

Would probably map to RDF something like:

     [Invoice] --number--> "1234"
     [       ] --customer--> "..."
     [       ] --item--> [ ] --number--> "1"
     [       ]           [ ] --description--> "..."
     [       ]           [ ] --amount--> "..."
     [       ] --item--> [ ] --number--> "2"
     [       ]           [ ] --description--> "..."
     [       ]           [ ] --amount--> "..."
     [       ] --total--> "..."

In the RDF, in this case, both elements and attributes have become ordinary 
properties.  In the RDF, the ordering of properties is not significant, and 
not represented in the abstract model.

I think part of the value of RDF is it's potential to normalize the 
information that really matters, and leave out the rest.  In particular, to 
provide a common structure for essential information that may be 
represented differently by different XML structures.  Rules to map 
arbitrary XML into RDF seem to defeat this benefit.  (And yes, I recognize 
that RDF is not the last word here.)

>As part of Redfoot, I would like to define a mapping language for
>describing, in a declarative way and for a particular XML schema, how to map
>instances of that schema into RDF triples.

A schema-dependent XML->RDF mapping makes a lot of sense to me.

>Furthermore, I believe that all descriptions of serialization of RDF should
>be separated out of the RDF Syntax specification and could be described
>merely in terms of the mapping language.

I have considerably sympathy for this view.

[...]

#g

------------
Graham Klyne
(GK@ACM.ORG)

Received on Monday, 11 September 2000 09:25:32 UTC