Re: comparing XML and RDF data models

Maciej, here is another way to look at this.  It is not any simpler, but 
it does illustrate a point of isomorphism between XML and RDF.

Take each of your XML samples and convert to Infoset RDF.  The first 
sample would look like this:

@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix xis: <http://www.w3.org/2001/04/infoset#> .

_:jA29510 a xis:Document ;
	xis:children _:jA29512 ;
	xis:documentElement _:jUd0e1 .
_:jA29512 a xis:InfoSetSeq ;
	rdf:_1 _:jUd0e1 .
_:jA29514 a xis:InfoSetSeq ;
	rdf:_1 _:jUd0e2 ;
	rdf:_2 _:jUd0e4 .
_:jA29516 a xis:InfoSetSeq ;
	rdf:_1 "Sensor220" .
_:jA29518 a xis:InfoSetSeq ;
	rdf:_1 _:jUd0e5 .
_:jA29520 a xis:InfoSetSeq ;
	rdf:_1 "E330" .
_:jUd0e1 a xis:Element ;
	xis:children _:jA29514 ;
	xis:localName "Sensor" .
_:jUd0e2 a xis:Element ;
	xis:children _:jA29516 ;
	xis:localName "name" .
_:jUd0e4 a xis:Element ;
	xis:children _:jA29518 ;
	xis:localName "isLocatedNearBy" .
_:jUd0e5 a xis:Element ;
	xis:children _:jA29520 ;
	xis:localName "Road" .

Then you could write a SPARQL query to get the information you wanted 
from any of the three formats, by using a UNION of patterns.  If later 
you introduced a new XML structure you would add another UNION pattern 
to your query.  Or you could CONSTRUCT a new graph in the desired schema 
from any of the various input schemas, again by using a UNION of 
patterns in the WHERE clause.

You could of course do the same thing by writing an XSLT stylesheet to 
convert any of your input formats to a single output format.

Any XML instance can be considered a compact, early-bound serialization 
of an infoset RDF graph.

A simple, generic XSLT can be used to convert any arbitrary XML instance 
to Infoset rdf.  Here's a sample that does most of it.

<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
   xmlns:xis="http://www.w3.org/2001/04/infoset#"
   xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
   xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#"
   version="2.0">

   <xsl:output method="xml" indent="yes"/>

   <xsl:strip-space elements="*"/>

   <xsl:template match="/">
     <rdf:RDF>
       <xis:Document>
         <xis:documentElement rdf:nodeID="{generate-id(child::*[1])}"/>
         <xis:children>
           <xis:InfoSetSeq>
             <xsl:apply-templates/>
           </xis:InfoSetSeq>
         </xis:children>
       </xis:Document>
     </rdf:RDF>
   </xsl:template>

   <xsl:template match="*">
     <rdf:li>
       <xis:Element rdf:nodeID="{generate-id()}">
         <xis:localName><xsl:value-of 
select="local-name()"/></xis:localName>
         <xsl:if test="@*">
           <xis:attributes>
             <xis:AttributeSet>
               <xsl:apply-templates select="@*"/>
             </xis:AttributeSet>
           </xis:attributes>
         </xsl:if>
         <xsl:if test="*|text()|comment()|processing-instruction()">
           <xis:children>
             <xis:InfoSetSeq>
               <xsl:apply-templates/>
             </xis:InfoSetSeq>
           </xis:children>
         </xsl:if>
       </xis:Element>
     </rdf:li>
   </xsl:template><!-- match="*" -->

   <xsl:template match="@*">
     <rdf:li>
       <xis:Attribute>
         <xis:localName><xsl:value-of 
select="local-name()"/></xis:localName>
         <xis:normalizedValue><xsl:value-of 
select="."/></xis:normalizedValue>
       </xis:Attribute>
     </rdf:li>
   </xsl:template><!-- match="@*" -->

   <xsl:template match="text()">
     <rdf:li>
       <xsl:value-of select="normalize-space(.)"/>
     </rdf:li>
   </xsl:template>

   <xsl:template match="comment()">
     <rdf:li>
       <xis:Comment>
         <xis:content>
           <xsl:value-of select="."/>
         </xis:content>
       </xis:Comment>
     </rdf:li>
   </xsl:template><!-- match="comment()" -->

   <xsl:template match="processing-instruction()">
     <rdf:li>
       <xis:ProcessingInstruction>
         <xis:target>
           <xsl:value-of select="local-name()"/>
         </xis:target>
         <xis:content>
           <xsl:value-of select="."/>
         </xis:content>
       </xis:ProcessingInstruction>
     </rdf:li>
   </xsl:template>

</xsl:stylesheet>


Maciej Gawinecki wrote:
> 
> In one of the article comparing two data models: XML and RDF I found a 
> statement stating that (I'm loosely citing from my memory):
> 
>   Searching XML with XPath query expression is easy if you know the
>   schema of the document being quiried. However, the same query will not
>   work any a document, which is differently structured, but contains
>   equivalent information. This can be solved by usage of RDF model,
>   which can be then queried with RDQL or SPARQL query.
> 
> Is that really true, that XPath-based XML search is limited due to its 
> structure? Yes, that's why there is a great research on keyword-based 
> quering of XML documents (not knowing schema in advance). But is it RDF 
> really better for this issue ?
> 
> I will try to give a few example what I exactly mean. [Of course, I'm 
> ommiting here the problem of knowning the name a tag/property/resource, 
> only the structure can be different.] Let's see two XML documents:
> 
>   <Sensor>
>     <name>Sensor220</name>
>     <isLocatedNearBy>
>       <Road>
>         E330
>       </Road>
>     <isLocatedNearBy>
>   </Sensor>
> 
> Here road value can be check through XPath expression: 
> \\Sensor\isLocatedNearBy\Road
> 
> And let's see differently structured document (road defined by name 
> property)
> 
>   <Sensor>
>     <name>Sensor220</name>
>     <isLocatedNearBy>
>       <Road>
>         <name>E330</name>
>       </Road>
>     <isLocatedNearBy>
>   </Sensor>
> 
> With XPath expression: \\Sensor\isLocatedNearBy\Road\name
> 
> Or yet another one (road is ancestor tag to the sensor tag, not the 
> oposite)
> 
>   <Road>
>     <name>E330</name>
>     <hasSensor>
>       <Sensor>
>         <name>Sensor 220</name>
>       </Sensor>
>     </hasSensor>
>   </Road>
> 
> XPath: \\Road\name
> 
> The same problem would be with RDF. Let see the first model
> 
>   :Sensor220 :isLocatedNearBy :Road_E330 .
> 
> WHERE clause of SPARQL query would be then like a
> 
>   ?s :isLocatedNearBy :Road_E330 .
> 
> For other version we define a road with a specific value of hasName 
> property:
> 
>   :Sensor220 :isLocatedNearBy :RoadXXX .
>   :RoadXXX :hasName "E330" .
> 
> the SPARQL query part:
> 
>   ?s :isLocatedNearBy ?r .
>   ?r :hasName "E330" .
> 
> or by analogy to the third XML representation (road "has" a sensor, not 
> the opposite):
> 
>   :RoadXXX :hasName "E330" .
>   :RoadXXX :hasSensor :Sensor220 .
> 
> the SPARQL query part:
> 
>   ?r :hasName "E330" .
>   ?r :hasSensor ?s .
> 
> Can someone comment it ?
> 
> Thanks,
> Maciej
> 
> 
> 
> 

Received on Wednesday, 2 July 2008 04:36:40 UTC