Making RDF Data Available for XML processing

Author
Andy Seaborne – Hewlett-Packard Laboratories, Bristol
Version:
$Revision: 1.10 $

Abstract

This document describes a way to connect RDF data to XML tools.  The approach is to use RDF query to extract information from RDF knowledge bases, then present the results in XML for further manipulation by XML tools.

Status of This Document

Discussion note for the RDF Data Access Working Group and the wider community.


1. Context

The RDF Data Access Working Group is charged with providing access to RDF Knowledge Bases (repositories, data stores – we will use the term repository) by selecting instances of subgraphs from an RDF graph.  This will involve a language for the query, and the use of RDF in some serializations for the returned results.  As part of the requirements process, the Working Group has refined this to include Variable Binding Results and local access to RDF repositories.

The DAWG charter also says:

There is a requirement for RDF data to be accessible within an XML Query context. The working group should specify at least one mechanism for exposing RDF query facilities in an XQuery environment; that is, a way to take a piece of RDF Query abstract syntax and map it into a piece of XML Query using some form of extension to XQuery.

Whereas the XQuery charter says:

The mission of the XML Query working group is to provide flexible query facilities to extract data from real and virtual documents on the Web. Real documents are documents authored in XML. Virtual documents are the contents of databases or other persistent storage that are viewed as XML via a mapping mechanism.

This document describes one possible way of exposing RDF data in an XQuery environment. Its approach is to use an RDF query language to perform the mapping from RDF data to an XML document, which may be virtual or real. Such an XML document may be the result of a local operation or a remote, web operation.

2. Description

DAWG has identified three classes of query results:

The first form, a result set, is intended to get information out of an RDF layer and into applications for direct (non-RDF) processing. The approach described in this document is to encode such result sets in XML so that XML tools can provide further processing. This would also enable streaming results.

The charter for DAWG emphases the remote access case: query results are representations of web resources.  This is clearest with bookmarkable queries but can also be applied to interactions over SOAP. The separation of the XML data model from the RDF data model occurs across this connection with an XML document crossing the boundary. The data values in the in the results are those of the RDF data model but these are XML data values as well: the graph labels which are URIrefs (XML datatype AnyURI) and RDF literals, which are either plain strings or typed according to XML Schema datatypes, following the work of the RDF Core working group (see the datatypes section of the RDF Primer).

On the web, a request (a query) is sent from the client to the remote machine using a protocol such as HTTP or SOAP. The return is an XML document presenting the answers to the query.

       Client machine                          Server machine

    +------------------+     HTTP request    +----------------+    
    | Application inc. |    ------------>    |                |   
    | XQuery processor |                     | RDF repository |
    |                  |    <------------    |                |
    +------------------+     XML Document    +----------------+
    

Only one, fixed schema is needed, not one per RDF repository or vocabulary or ontology.  It is an XML schema for result sets that expresses the various bindings of variables that satisfy the query, with data items being the data items in RDF.  The RDF graph abstract syntax has not played a part of the result set.

3. Benefits

A common theme in RDF query languages is graph pattern matching - a graph pattern is a set of RDF triples with some slots replaced by named variables.  The approach of doing RDF query in an RDF query language means that common paradigms, such as graph pattern matching, are retained.  There is now a separation of responsibilities so that experts in the extraction of information fro RDF repositories can work on the queries that do that, while the experts in XML processing concentrate on the transformation of the XML-encoded result set into other XML, including XHTML.  In other words, the RDF repository provides the data, with a data access language of RDF query; the business logic processing is left to the calling application and the XQuery processor.

For example, where a query request is encoded into an HTTP URL, the "doc()" term of XQuery can be used for the RDF data access.

The RDF repository can be accessed by other systems, not just XML-based toolsets such as EJB systems.

The RDF graph data model and the XQuery data model are kept separate; it is the responsibility of the RDF query system to create the XML document. There need only be one such schema - further manipulation of XML can be carried out with standard XML tools.  Solutions that directly query RDF from XQuery (whether RDF abstract syntax or some serialization) require that the query writer follow the RDF syntax rules with little language support for them.

4. Integrating with XQuery

XQuery can be extended through function libraries.  We show how this can be done using the result set XML document.  Details are shown below in the example.

A single new function is provided that can issue a BRQL SELECT query and which encodes the results an XML document according to result set schema.  The result of this function can be access with XPath in the usual way. Other function libraries, with a richer set of operations, or customized to the application context could be written.  The example here shows just a general, but low level, mechanism.

One of the issues that can arise from a Functional Accessor style of approachs can be that the RDF graph access is only at a fine-grained level, such as a single triple pattern. Allowing more complex graph patterns allows existing XQuery processors to access data in RDF repositories in an efficient manner without alternation.

5. Disadvantages

There are two query languages to learn, not one. To mitigate this, common patterns of RDF query could be incorporated into an XQuery library.  This would also enable the separation of responsibilities in designing the business logic and the data access.

6. Alternatives

Jonathan Robie has described connecting XQuery to RDF through a function library for XQuery based on a triple access function to the RDF.  This can give a close coupling of the XQuery processor to an RDF store, without the additional overhead of an RDF query processor. It does not allow remote access.

The approach taken be Treehugger is to create a virtual XML document derived from the RDF graph as the application navigates the XML presentation of the underlying RDF data. Again, this requires a close association between the XQuery processor and the RDF storage as the virtual XML document is created as the query executes.

7. Other linkages

The XPath functions provide a valuable and rich resource of operators on RDF literals.

7. Experimentation

BRQL 0.3 (and above) contains a simple XML output format for SELECT queries together with an example XSLT stylesheet to transform a result set in XML into an HTML table.

8. Examples

We provide two examples.

Firstly, we show manipulation of an XML result set document (in the fixed schema) to create an HTML table.  The code to do this is query-independent.

Secondly, we show an XQuery with an external function to execute a query and get back the result set  as a DOM tree and then style the results.

The same data and query is used in each example. Many thanks to Damian Steer and Howard Katz for help construcing these examples.

The example query is:

SELECT ?name, ?mbox
WHERE
    (?person rdf:type foaf:Person)
OPTIONAL
    (?person foaf:name ?name)
OPTIONAL
    (?person foaf:mbox ?mbox)
USING foaf FOR <http://xmlns.com/foaf/0.1/>

gives the table, created using XQuery or XSLT as described below:

name mbox
- - - - <fred@edu>
Eve - - - -
Alice <mailto:alice@work>
Bob <mailto:bob@work>
Bob <mailto:bob@home>

Example data:

@prefix foaf:       <http://xmlns.com/foaf/0.1/> .
@prefix rdf:        <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs:	    <http://www.w3.org/2000/01/rdf-schema#> .

_:alice
    rdf:type        foaf:Person ;
    foaf:name       "Alice" ;
    foaf:mbox       <mailto:alice@work> ;
    foaf:knowns     _:bob ;
    .

_:bob
    rdf:type        foaf:Person ;
    foaf:name       "Bob" ; 
    foaf:knowns     _:alice ;
    foaf:mbox       <mailto:bob@work> ;
    foaf:mbox       <mailto:bob@home> ;
    .


_:eve
    rdf:type      foaf:Person ;
    foaf:name     "Eve" ; 
    foaf:knows    _:fred ;
    .

_:fred
    rdf:type      foaf:Person ;
    foaf:mbox     <fred@edu> .

via the result set in XML:

<resultSet
    xmlns:xsd="http://www.w3.org/2001/XMLSchema#"
    xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    <vars>
        <var>name</var>
        <var>mbox</var>
    </vars>
    <solution>
        <binding>
            <var>mbox</var>
            <uri>fred@edu</uri>
        </binding>
    </solution>
    <solution>
        <binding>
            <var>name</var>
            <value>Eve</value>
        </binding>
    </solution>
    <solution>
        <binding>
            <var>name</var>
            <value>Alice</value>
        </binding>
        <binding>
            <var>mbox</var>
            <uri>mailto:alice@work</uri>
        </binding>
    </solution>
    <solution>
        <binding>
            <var>name</var>
            <value>Bob</value>
        </binding>
        <binding>
            <var>mbox</var>
            <uri>mailto:bob@work</uri>
        </binding>
    </solution>
    <solution>
        <binding>
            <var>name</var>
            <value>Bob</value>
        </binding>
        <binding>
            <var>mbox</var>
            <uri>mailto:bob@home</uri>
        </binding>
    </solution>
</resultSet>

8.1 Creating an HTML table from a Result Set

In XQuery:

(: Thanks to Howard Katz for making the XQuery clearer :)

declare variable $doc as xs:string external ;
declare variable $resultSet { doc($doc)/resultSet };
declare variable $selectVars { $resultSet/vars/var/text() };
<html>
<head>
   <title>BRQL ResultSet to HTML table (XQuery)</title>
<style>
<![CDATA[
td, th { padding-left:0.5em; padding-right: 0.5em; 
         padding-top:0.2ex ; padding-bottom:0.2ex }
]]>
</style>

</head>

<body>

<table border="1" style="border-collapse: collapse" 
	   bordercolor="black">
  <tr>
{
 for $x in $selectVars
   return <th>{$x}</th>
}
  </tr>


{
  for $soln in $resultSet/solution
  return
    <tr>
    {
        for $v in $selectVars
        let $binding := $soln/binding[ var = $v]
        return
            if ( exists($binding/uri) )
            then   <td><{$binding/uri/text()}></td>
            else if ( exists($binding/bNode) ) 
            then   <td>_:{$binding/bNode/text()}</td>
            else if ( exists($binding/value) )
            then   <td>{$binding/value/text()}</td>
            else   <td>- - - -</td>
    }
    </tr>
}

</table>

</body>
</html>

or XSLT transform:

<?xml version="1.0"?>

<!-- Much help from Damian / Thanks! -->

<xsl:stylesheet 
 xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
 version="1.0">

  <!-- Find variables -->
  <xsl:variable name="vars" select="//vars/var/text()"/>

  <xsl:template match="/">
    <html>
      <head>
	<title>BRQL ResultSet to HTML table (XSLT)</title>
	<style>
	  <![CDATA[
td, th { padding-left:0.5em; padding-right: 0.5em; 
         padding-top:0.2ex ; padding-bottom:0.2ex }
]]>
	</style>
      </head>
      <body>
	<xsl:apply-templates/>
      </body>
    </html>
  </xsl:template>

  <xsl:template match="resultSet">
    
    <table border="1" style="border-collapse: collapse" 
	   bordercolor="black">
      <tr>
	<xsl:for-each select="$vars">
	  <th><xsl:value-of select="."/></th>
	</xsl:for-each>
      </tr>
      
      <xsl:apply-templates select="solution"/>
    </table>
    
  </xsl:template>

  <xsl:template match="solution">
    <xsl:variable name="n" select="."/>
    <tr>
      <!-- For each variable name -->
      <xsl:for-each select="$vars">
	<xsl:variable name="x" select="."/>
	<td>
	  <xsl:variable name="here" select="$n/binding[var=$x]"/>
	  <xsl:if test="not($here)">- - - -</xsl:if>
	  <xsl:apply-templates select="$here" mode="cell" />
	</td>
      </xsl:for-each>
    </tr>
  </xsl:template>

  <!-- Templates for outputting table cells -->

  <xsl:template match="value" mode="cell" >
    <xsl:value-of select="."/>
  </xsl:template>

  <xsl:template match="uri" mode="cell">
    <<xsl:value-of select="."/>>
  </xsl:template>

  <xsl:template match="bNode" mode="cell" >
    _:<xsl:value-of select="."/>
  </xsl:template>

  <xsl:template match="var" mode="cell" >
  </xsl:template>

</xsl:stylesheet>

 

8.2 Extending XQuery

We can build on the styling above and provide a single, common extension function for XQuery which takes a BRQL query string and the name of a RDF data source.  The function executes the query and returns the results as a DOM tree that is further manipulated by the XQuery processor.

This example has been run using Saxon v8.0 and BRQL v0.6.

declare namespace brql = "java:com.hp.hpl.jena.brql.extensions.RDFQuery" ;

(: Query to execute :)
declare variable $query as xs:string external ;
(: Graph to query :)
declare variable $source as xs:string external ;

(: Format a binding value :)
declare function local:binding($binding)
{
  if ( exists($binding/uri) )
  then   <td>&lt;{$binding/uri/text()}&gt;</td>
  else if ( exists($binding/bNode) ) 
  then   <td>_:{$binding/bNode/text()}</td>
  else if ( exists($binding/value) )
  then   <td>{$binding/value/text()}</td>
  else   <td>- - - -</td>
} ;


<html>
<head>
  <title>BRQL Query Issued from an XQuery processor</title>
<style>
<![CDATA[
td, th { padding-left:0.5em; padding-right: 0.5em; 
         padding-top:0.2ex ; padding-bottom:0.2ex }
]]>
</style>

</head>

<body>
{
  let $rs :=
        brql:query($query, $source)/resultSet 
  let $selectVars := $rs/vars/var/text()
  return 
    <table border="1" style="border-collapse: collapse" 
           bordercolor="black">
    {
        for $x in $selectVars
        return <th>{$x}</th>
    }
    {
        for $soln in $rs/solution
        return
            <tr>
            {
                for $v in $selectVars
                let $binding := $soln/binding[ var = $v]
                return
                    local:binding($binding)
            }
            </tr>
        }
    </table>
}
</body>
</html>