Triple Production Description

An informal explanation of triple production:

RDF/XML Striped Syntax

We consider the subset of RDF/XML documents that conform to the following RelaxNG schema.

namespace local = ""
namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"

start = RDF
RDF = element rdf:RDF { description* }
description = element rdf:Description { aboutAttr?, propertyElt* }
propertyElt = element * - local:* {
                             description
                            | string?
                          }
aboutAttr = attribute rdf:about { URI-reference }
URI-reference = string

In such documents the mapping into graph syntax maps:

each description into a vertex in the graph possibly labelled with a URI
each propertyElt into an edge in the graph.

We describe this mapping by describing the edges as triples, in which the vertices in the graph which are not labelled with a URI are referenced using a locally scoped name (bNodeLabel).
descriptions which have an aboutAttr map to vertices labelled with the URI-reference value of the aboutAttr. Other descriptions map to unlabelled vertices, and are assigned a locally unique bNodeLabel for the purpose of describing the triples.
Each propertyElt with a description content maps to a triple given by:

the mapping of its parent element (either a URI or bNodeLabel)
the URI corresponding to the tag of the propertyElt itself
the mapping of its child description element (either a URI or bNodeLabel)

Each propertyElt with string or empty content maps to a triple given by:

the mapping of its parent element (either a URI or bNodeLabel)
the URI corresponding to the tag of the propertyElt itself
the string-value (or the empty string)

RDF/XML Advanced Syntax Overview

There are the following aspects to the rest of RDF/XML syntax:

abbreviations

typed nodes, property attributes, omitted description elements,
rdf:ID as an alternative to rdf:about
collection membership counting

reification
bagID

distributed subjects (rdf:aboutEach)

The abbreviations and collection membership counting can be seen as occuring prior to reification, bagID processing. aboutEach resolution can be seen as coming after all other processing.
The abbreviations have no significance other than saving typing.
The collection membership provides an incremental counter.
Reification uses four triples to model a triple added by a propertyEltor an abbreviation.
A bagID constructs a collection of reifications of all the triples added by propertyElts or abbreviations that are child nodes of a description.
Distributed subjects can be used to avoid certain repititious parts in the RDF/XML. Distributed subjects are processed by separately collecting all the triples generated with subject corresponding to an element with an rdf:aboutEach attribute. Only after all other processing is completed are these distributed subject triples joined with the other triples in the graph.

Striping

In the advanced syntax it is necessary to distinguish the different stripes (typedNodes from propertyElts) in the RDF/XML document. This is made harder by the rdf:parseType="Resource" propertyEltproduction.
It is easier to detect mistaken use of rdf:li as a typedNode if striping is resolved before collection membership counting.

Abbreviations

FIXME: blow-by-blow account of each abbreviation:

typedNode
rdf:ID
propAttr
rdf:type propAttr
rdf:parseType="Resource"
propertyElt with propAttrs, rdf:resource or rdf:bagID

FIXME: define primary triple for a propertyElt, to distinguish it from other ones arising from propAttrs of a propertyElt.

These abbreviations can be done in any order, in particular triples arising from typedNodes and propAttrs are not ordered.

Collection membership counting

rdf:li may be used as the tag on a propertyElt.
Each such propertyElt is equivalent to one with a tag of rdf:_{1+count(preceding-sibling::rdf:li)} FIXME say it in English.
This step must be done before any other that needs to:

make use of the name of the predicate of a triple (e.g. reification)
or, treat this propertyElt independently from its preceding-siblings.

Reification and bagID

The analysis concerning reification is very difficult to separate from that for bagID.

BagID

A bagID attribute on a typedNode or a description element signals the reification of all triples arising from:

the typedNode construction
property attributes
and the primary triple from each property element child.

The primary triples of the propertyElt children may already be being explicitly reified with an rdf:ID attribute. In such cases, the bagID does not cause a second reification, but refers to the labelled reification.
The bagID='bID' attribute signals the creation of the following triple:

<#bID> <rdf:type> <rdf:Bag> .

and a triple

<#bID> <rdf:_NNN> <#reifyID> .

for each of the primary triples from property element children that have an explicit reification (with rdf:ID='reifyID'); and a triple

<#bID> <rdf:_NNN> _:bNodeLabel .

for each of the other triples identified above, where bNodeLabelis the local label for the node of the graph being the reification of the triple. The rdf:_NNN are the properties rdf:_1, rdf:_2 etc, sequentially starting from 1. No correspondence is specified between the order rdf:_1, rdf:_2 etc and any other order. In particular, it is not the case that the two generated triples

<#bID> rdf:_1 _:Statement1 .
<#bID> rdf:_2 _:Statement2 .

imply that the statement reified as Statement1 occurs earlier in the XML document than that reified as Statement2.

Reification

Reification results in four triples as given in rule reification string or rule reification resource.
This applies equally whether reification is explicitly triggered through an rdf:ID attribute or implicitly triggered through an rdf:bagID attribute.

Distributed Subjects

rdf:aboutEach='AboutEachURI' attributes are allowed on top-level elements in place of rdf:about or rdf:ID.
rdf:bagID is not permitted on such an element.
rdf:ID is not permitted on any of its propertyElt children.
The children nodes of such top-level elements are processed like the children of other top-level elements.
However, triples are not generated which corresponding to the property attributes of such top-level elements or to its primary triples of its property elements. FIXME: text here is bad.
Instead, triples

<AboutEachURI> <predicate>_:x .
<AboutEachURI> <predicate> <object> .

and

<AboutEachURI> <predicate> "object" .

are added to a separate bag of distributed subject triples.
After all other processing the following join is performed between the dsitributed subject triples and the output triples.
Whenever

<AboutEachURI> <predicate> Object .

is in the bag of distributed subject triples
and

<AboutEachURI> <rdf:_NNN> member .

is in the bag of output triples, and member is not a literal,
then add

member <predicate> Object .

to the output triples.
The intent is that this join can be performed without careful attention to ordering and closure issues.
Hence, an RDF/XML documents must not be such that processing it would generate any instance of the following:

two distributed subject triple

<ABOUTEACH> <rdf:_NNN> Object .

<ABOUTEACH2> <predicate> Object2 .

Object

and an output triple

<ABOUTEACH> <rdf:_MMM> <ABOUTEACH2> .

In particular the following document is illegal:

<rdf:RDF xmlns:rdf="...">
  <rdf:Description rdf:aboutEach="#foo">
    <rdf:li rdf:resource="#bar" />
  </rdf:Description>
  <rdf:Description rdf:about="#foo">
    <rdf:li rdf:resource="#foo"/>
  </rdf:Description>
</rdf:RDF>

The issue being the difficulty of deriving (or not deriving)
<#bar> <rdf:_1> <#bar> .
given the more obvious triples:
<#foo> <rdf:_1> <#foo> .
<#foo> <rdf:_1> <#bar> .