RDF/A implementation in XSLT2

Discussion

This is a very first version of an implementation of RDF/A using XSLT2.

The basic idea is to express RDF/A sections 4 and 5 as a set of simple rules in XML. These provide rules for matching subjects, predicates, and objects. These rules are combined using a first XSL Transform, to give a set of more complex triple rules. A second transform then takes the complex rules and converts them to be an XSLT program, which implements RDF/A.

The goal of this somewhat convoluted approach is to make sure that there is a good correspondence to what this program does and what is written in RDF/A.

Design Overview

The text of sections 4 and 5 is expressed as simple rules using XPath expressions e.g.

  <subject
    x:para='4.2.4.1'
    a:match='[not(@about)][not(@id)][not(@nodeID)][../@nodeID]'
    rdf:nodeID="concat('u.',../@nodeID)"
   />

Says that paragraph 4.2.4.1 says that if an element matches the expression (no @about, @id or @nodeID attribute, with parent with a @nodeID attribute) then add an rdf:nodeID attriute with value calculated as given to give the subject (in RDF/XML).

The predicate rules are similar

  <predicate
    a:reversed='true'
    a:object='resource'
    x:para='4.3.3'
    a:match='[@rev]'
    name='@rev'
   />

This rule requires a resource object, is defined in para 4.3.3, matches when there is an @rev attribute, and the name of the predicate is the value of the @rev attribute.

Objects may be resource or literal valued, here is a resourced valued one:

  <object
    a:object='resource'
    x:para='4.4.2'
    rdf:resource='resolve-uri(@href)'
    a:match='[@href]'
   />

The @a:object value of resource shows that we can be used with the previous predicate rule. Any subject rule can be used. The @rdf:resource value is to be applied to the proeprty element in the generated RDF/XML.

The combineRules transform, takes all combinations of S and P and O rules subject to the single constraint that the P and O rules have the same @a:object value, and makes a new set of longer triple oriented rules.

These three are combined to:

   <match select="xhtml2:*[not(@about)][not(@id)][not(@nodeID)][../@nodeID][@rev][@href]">
      <subject x:para="4.4.2" rdf:about="resolve-uri(@href)"/>
      <predicate x:para="4.3.3" name="@rev"/>
      <object x:para="4.2.4.1" rdf:nodeID="concat('u.',../@nodeID)"/>
   </match>

Because of the @a:reversed='true' value the subject and object have been swapped, and the swapping knows to change the @rdf:resource on the object into an @rdf:about on the subject. The @a:* attributes have been stripped.

This rule is then transformed by the rules2xslt transform to give the following XSLT2 fragment, which matches relevant XHTML2 fragments and gives an RDF/XML fragment, implementing the combination of the three paras of the RDF/A document.

<xsl:for-each select="//xhtml2:*[not(@about)][not(@id)][not(@nodeID)][../@nodeID][@rev][@href]">
  <rdf:Description>
    <xsl:attribute name="rdf:about" select="resolve-uri(@href)"/>
    <xsl:element name="{@rev}" namespace="{namespace-uri-for-prefix(substring-before(@rev,':'),.)}">
       <xsl:attribute name="rdf:nodeID" select="concat('u.',../@nodeID)"/>
    </xsl:element>
  </rdf:Description>
</xsl:for-each>

Para 4.4.3 is particularly difficult, and I use an x:foreach attribute. The approach slightly inverts the wording of the paragrpah since I am searching for a context statement without an @href, and then apply the rule foreach matching child. The simple rule looks like:

  <object
    a:object='resource'
    x:para='4.4.3'
    a:match='[not(@href)]'
    rdf:nodeID="concat('u.',@nodeID)"
    x:foreach='*[@nodeID]'
   />

The a:object attribute is used to match with appropriate predicate rules. This rule applies when there is not an @href attribute, and para 4.4.3 applies (within that para the [not(@href)] applies to the context statement. This particular rule considers the case where the current statement in that paragraph has a nodeID, foreach of these cases we can generate a triple with an rdf:nodeID attribute on the object constructed as shown.

The combined rule (with the previous S and P rules) is:

   <match select="xhtml2:*[not(@about)][not(@id)][not(@nodeID)][../@nodeID][@rev][not(@href)]">
      <subject x:para="4.4.3" rdf:nodeID="concat('u.',@nodeID)" x:foreach="*[@nodeID]"/>
      <predicate x:para="4.3.3" name="@rev"/>
      <object x:para="4.2.4.1" rdf:nodeID="concat('u.',../@nodeID)"/>
   </match>

The corresponding XSLT is horrendous, partially because it is autogenerated:

<xsl:for-each select="//xhtml2:*[not(@about)][not(@id)][not(@nodeID)][../@nodeID][@rev][not(@href)]">
  <xsl:variable name="c" select="."/>
  <xsl:for-each select="*[@about]">
    <rdf:Description>
      <xsl:attribute name="rdf:about" select="resolve-uri(@about)"/>
      <xsl:for-each select="$c">
        <xsl:element name="{@rev}" namespace="{namespace-uri-for-prefix(substring-before(@rev,':'),.)}">
          <xsl:attribute name="rdf:nodeID" select="concat('u.',../@nodeID)"/>
        </xsl:element>
      </xsl:for-each>
    </rdf:Description>
  </xsl:for-each>
</xsl:for-each>

Minor Technical Points

Note the following features of XSLT2 that are being used:

resolve-uri: Knows about xml:base, unfortunately resolve-uri("") seems to be buggy in my XSLT2 implementation.
to do

The constructs concat('g.',generate-id(.)) and concat('u.',@nodeID) prevent name collisions between user defined nodeIDs and system gensyms.

Next Steps

Next steps include:

Providing triples in document order (useful for testing and debugging)
Providing HTML version of rules2 file

RDF/A implementation in XSLT2

Links

Discussion

Design Overview

Minor Technical Points

Next Steps