RE: xml DOM as rdf from Danny Ayers on 2002-08-16 (www-rdf-interest@w3.org from August 2002)

From: Danny Ayers <danny666@virgilio.it>
Date: Fri, 16 Aug 2002 20:44:31 +0200
To: "RDF-Interest" <www-rdf-interest@w3.org>, "Seth Russell" <seth@robustai.net>
Message-ID: <EBEPLGMHCDOJJJPCFHEFGENMHHAA.danny666@virgilio.it>
>Yes, I was going to suggest this schema too.  But to convert any
>kind of XML
>into RDF, don't you still need a mapping from each element and attribute
>name in the XML document to the set of things defined by the RDF model?
>I've used translatesTo in the following example:
>
>foo
>    type infoset:Element;
>    translatesTo sru:Node.
>bar
>    type infoset:Attribute;
>    sru:translatesTo sru:Arrow;
>    sru:content rdfs:Resource.
>
>So that the following XML:
>
><foo>
>     <bar>http://robustai.net/sailor/</bar>
></foo>
>
>Would end up being the following RDF:
>
><rdf:description>
>    <bar resource="http://robustai.net/sailor/" />
></rdf:description>
>
>This needs a lot more work, but perhaps you get the drift.  The point being
>that we can extract RDF from any given XML but just describing the XML
>schema in this manner.  Alternatively you could just assume that Elements
>alternate between nodes and arrows as they nest and that all attributes are
>arrows ... but making this explicit should give us better results.

I've been attacking this general problem on a few levels, in different
ways - one of which is very like your translatesTo (I have a mapSource and a
mapTarget instead). In all of them is the use of an internal representation
that in itself has virtually no constraints, just a decorated digraph. For
straight XML with arbitrary/unknown semantics, so far I've been taking it in
with vertices corresponding to elements, with these vertices carrying the
set of attributes from the element as a hashtable. The nesting of the tree
is interpreted as edges/arcs. It would be straightforward to include the
attributes as arc+nodes dangling off the element, but with the formats I've
played with so far this hasn't been needed. The options are fairly wide open
for taking the internal graph and making RDF from it (probably with dangling
nodes), and you can see what I wanted the vocabulary for - something like :

source xml -
<a>
	<b x="4"/>
</a>

internal graph -
[a] -[parent]-> [b {x=4}]

output rdf -

(I wasted half an hour trying to manually work this out, then realised it'd
be quicker to hack the code - there are undoubtedly many mistakes. Anyone
happen to know how to prettyprint away the RDFNsIds in Jena, btw?)

<rdf:RDF
     xmlns:RDFNsId2='http://www.w3.org/2001/04/infoset#'
     xmlns:RDFNsId1='http://purl.org/puninj/2001/05/rgml-schema#'
     xmlns:rdfs='http://www.w3.org/2000/01/rdf-schema#'
     xmlns:rdf='http://www.w3.org/1999/02/22-rdf-syntax-ns#'>

     </RDFNsId1:Node>
     <RDFNsId1:Node rdf:about='vertex1'
          rdfs:label='a'
          rdf:type='http://www.w3.org/2001/04/infoset#Element'/>

     <RDFNsId1:Node rdf:about='vertex2'
          rdfs:label='b'
          rdf:type='http://www.w3.org/2001/04/infoset#Element'>
         <RDFNsId2:attributes>
             <RDFNsId2:AttributeSet>
                 <rdf:_1
                      rdf:type='http://www.w3.org/2001/04/infoset#Attribute'
                      RDFNsId2:localName='x'
                      RDFNsId2:normalizedValue='4'/>
             </RDFNsId2:AttributeSet>
         </RDFNsId2:attributes>

     <RDFNsId1:Edge rdf:ID='e1'>
         <RDFNsId1:source rdf:resource='#vertex1'/>
         <RDFNsId1:target rdf:resource='#vertex2'/>
     </RDFNsId1:Edge>

</rdf:RDF>

The IDs are just local for now - imagine a http://behind
RGML is a minimal graph vocabulary.

Hmm - that is a little more verbose than
<a><b x="4"/></a>
Maybe it isn't such a good idea after all...

At the smartest (and least implemented) level, I've been putting together a
subsystem for taking a graph (or tree) with any semantics and mapping it to
the generalised internal graph using transformations specified in an RDF
file. This is rather like the RDFPath idea, but I think by allowing
processing instructions as well the system should be able to handle a wider
variety of input/output. Once the data is in the internal graph, it can be
mapped out again in the same declarative fashion.

This stuff very quickly flies off into the deep end, so I've spent quite a
lot of time hard-coding the transformations for various languages and trying
to get the core model in the application in a form that would be a good
compromise between versatility & easiness.

Incidentally, Manos mentioned redundancy, and my app is becoming a big
friend of that - below is the SVG generated at the same time from the input
xml.

Cheers,
Danny.

<?xml version="1.0"?>
<?xml-stylesheet href="null" type="text/css"?>
<svg contentScriptType="text/ecmascript" zoomAndPan="magnify"
     contentStyleType="text/css" viewBox="0 0 800 600"
     preserveAspectRatio="xMidYMid meet" xmlns="http://www.w3.org/2000/svg"
     version="1.0">
    <defs>
        <marker refX="0" markerUnits="strokeWidth" refY="5" orient="auto"
                class="triangle" markerHeight="9" id="triangle"
                viewBox="0 0 10 10" preserveAspectRatio="xMidYMid meet"
                markerWidth="12">
            <path d="M 0 0 L 10 5 L 0 10 z"/>
        </marker>
    </defs>
    <g uri="edges">
        <g>
            <line x1="120" x2="120" y1="121" y2="186" class=""
                  marker-end="url(#triangle)"/>
        </g>
        <g>
            <line x1="120" x2="120" y1="121" y2="186" class=""
                  marker-end="url(#triangle)"/>
            <text class="" transform="translate(120,151)">
                subclass
            </text>
        </g>
    </g>
    <g uri="vertices">
        <g transform="translate(120,91)            scale(2,2)">
            <rect x="-50" width="100" y="-15" height="30" class="vertex"/>
            <text class="H2">
                a
            </text>
        </g>
        <g transform="translate(120,216)              scale(2,2)">
            <ellipse rx="50" ry="15" class="vertex"/>
            <text class="H2">
                b
            </text>
        </g>
    </g>
    <g uri="adjuncts"/>
</svg>
Received on Friday, 16 August 2002 14:53:16 UTC