An informal explanation of triple production:
RDF/XML Striped Syntax
We consider the subset of RDF/XML documents that conform to the following
RelaxNG schema.
namespace local = ""
namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#"
start = RDF
RDF = element rdf:RDF { description* }
description = element rdf:Description { aboutAttr?, propertyElt* }
propertyElt = element * - local:* {
description
| string?
}
aboutAttr = attribute rdf:about { URI-reference }
URI-reference = string
In such documents the mapping into graph syntax maps:
-
each description into a vertex in the graph possibly labelled
with a URI
-
each propertyElt into an edge in the graph.
We describe this mapping by describing the edges as triples, in which the
vertices in the graph which are not labelled with a URI are referenced
using a locally scoped name (bNodeLabel).
descriptions which have an aboutAttr map to vertices
labelled with the URI-reference value of the aboutAttr. Other
descriptions map to unlabelled vertices, and are assigned a locally
unique bNodeLabel for the purpose of describing the triples.
Each propertyElt with a description content maps
to a triple given by:
-
the mapping of its parent element (either a URI or bNodeLabel)
-
the URI corresponding to the tag of the propertyElt itself
-
the mapping of its child description element (either a URI or bNodeLabel)
Each propertyElt with string or empty content maps to a triple
given by:
-
the mapping of its parent element (either a URI or bNodeLabel)
-
the URI corresponding to the tag of the propertyElt itself
-
the string-value (or the empty string)
RDF/XML Advanced Syntax Overview
There are the following aspects to the rest of RDF/XML syntax:
-
abbreviations
-
typed nodes, property attributes, omitted description elements,
-
rdf:ID as an alternative to rdf:about
-
collection membership counting
-
distributed subjects (rdf:aboutEach)
The abbreviations and collection membership counting can be seen as occuring
prior to reification, bagID processing. aboutEach resolution
can be seen as coming after all other processing.
The abbreviations have no significance other than saving typing.
The collection membership provides an incremental counter.
Reification uses four triples to model a triple added by a propertyElt
or an abbreviation.
A bagID constructs a collection of reifications of all the
triples added by propertyElts or abbreviations that are child
nodes of a description.
Distributed subjects can be used to avoid certain repititious parts
in the RDF/XML. Distributed subjects are processed by separately collecting
all the triples generated with subject corresponding to an element with
an rdf:aboutEach attribute. Only after all other processing is
completed are these distributed subject triples joined with the other triples
in the graph.
Striping
In the advanced syntax it is necessary to distinguish the different stripes
(typedNodes from propertyElts) in the RDF/XML document.
This is made harder by the rdf:parseType="Resource" propertyElt
production.
It is easier to detect mistaken use of rdf:li as a typedNode
if striping is resolved before collection membership counting.
Abbreviations
FIXME: blow-by-blow account of each abbreviation:
-
typedNode
-
rdf:ID
-
propAttr
-
rdf:type propAttr
-
rdf:parseType="Resource"
-
propertyElt with propAttrs, rdf:resource or rdf:bagID
FIXME: define primary triple for a propertyElt, to distinguish
it from other ones arising from propAttrs of a propertyElt.
These abbreviations can be done in any order, in particular triples
arising from typedNodes and propAttrs are not ordered.
Collection membership counting
rdf:li may be used as the tag on a propertyElt.
Each such propertyElt is equivalent to one with a tag of rdf:_{1+count(preceding-sibling::rdf:li)}
FIXME say it in English.
This step must be done before any other that needs to:
-
make use of the name of the predicate of a triple (e.g. reification)
-
or, treat this propertyElt independently from its preceding-siblings.
Reification and bagID
The analysis concerning reification is very difficult to separate
from that for bagID.
BagID
A bagID attribute on a typedNode or a description
element signals the reification of all triples arising from:
-
the typedNode construction
-
property attributes
-
and the primary triple from each property element child.
The primary triples of the propertyElt children may already be
being explicitly reified with an rdf:ID attribute. In such cases,
the bagID does not cause a second reification, but refers to the
labelled reification.
The bagID='bID' attribute signals the creation
of the following triple:
<#bID> <rdf:type> <rdf:Bag> .
and a triple
<#bID> <rdf:_NNN> <#reifyID> .
for each of the primary triples from property element children that have
an explicit reification (with rdf:ID='reifyID'); and a
triple
<#bID> <rdf:_NNN> _:bNodeLabel .
for each of the other triples identified above, where bNodeLabel
is the local label for the node of the graph being the reification
of the triple. The rdf:_NNN are the properties rdf:_1, rdf:_2
etc, sequentially starting from 1. No correspondence is specified between
the order rdf:_1, rdf:_2 etc and any other order. In particular,
it is not the case that the two generated triples
<#bID> rdf:_1 _:Statement1 .
<#bID> rdf:_2 _:Statement2 .
imply that the statement reified as Statement1 occurs earlier
in the XML document than that reified as Statement2.
Reification
Reification results in four triples as given in rule
reification string or rule
reification resource.
This applies equally whether reification is explicitly triggered through
an rdf:ID attribute or implicitly triggered through an rdf:bagID attribute.
Distributed Subjects
rdf:aboutEach='AboutEachURI' attributes are allowed on top-level
elements in place of rdf:about or rdf:ID.
rdf:bagID is not permitted on such an element.
rdf:ID is not permitted on any of its propertyElt
children.
The children nodes of such top-level elements are processed like the
children of other top-level elements.
However, triples are not generated which corresponding
to the property attributes of such top-level elements or to its primary
triples of its property elements. FIXME: text here is bad.
Instead, triples
<AboutEachURI> <predicate>_:x .
<AboutEachURI> <predicate> <object> .
and
<AboutEachURI> <predicate> "object" .
are added to a separate bag of distributed subject triples.
After all other processing the following join is performed between
the dsitributed subject triples and the output triples.
Whenever
<AboutEachURI> <predicate> Object
.
is in the bag of distributed subject triples
and
<AboutEachURI> <rdf:_NNN> member
.
is in the bag of output triples, and member is not a literal,
then add
member <predicate> Object .
to the output triples.
The intent is that this join can be performed without careful attention
to ordering and closure issues.
Hence, an RDF/XML documents must not be such that processing it would
generate any instance of the following:
-
two distributed subject triple
<ABOUTEACH> <rdf:_NNN> Object .
<ABOUTEACH2> <predicate> Object2 .
with Object being a URI or local reference.
-
and an output triple
<ABOUTEACH> <rdf:_MMM> <ABOUTEACH2> .
In particular the following document is illegal:
<rdf:RDF xmlns:rdf="...">
<rdf:Description rdf:aboutEach="#foo">
<rdf:li rdf:resource="#bar" />
</rdf:Description>
<rdf:Description rdf:about="#foo">
<rdf:li rdf:resource="#foo"/>
</rdf:Description>
</rdf:RDF>
The issue being the difficulty of deriving (or not deriving)
<#bar> <rdf:_1> <#bar> .
given the more obvious triples:
<#foo> <rdf:_1> <#foo> .
<#foo> <rdf:_1> <#bar> .