Parsing and Containers

A number of issues have arisen with the processing of 
containers by parsers and other RDF processors.  It
would be a good thing if most (all) parsers handled
them the same way.

It is appropriate for the RDF Interest Group to discuss
the interpretation of the current specification and to
document any conclusions that are made.  While the 
Interest Group does not have a charter to revise the 
specification, the result of this discussion could be
offered for use in an errata document and could
be provided as input to a future W3C Working Group if
one is chartered to update the specification.

We would like to get a general consensus among the
RDF Interest Group on how parsers should handle containers. 
To kick things off, we have written a strawman proposal.

The following proposal represents the views of the authors
and is not an endorsement by the W3C.  We invite comment.

Brian McBride
Dave Beckett

----------------------------------------------------------------

A Proposed Interpretation of RDF Containers
===========================================

Draft 1.0
13th December 2000


1. Issue Statement
   ===============

The RDF formal grammar defined in the Model and Syntax
Specification [1] is ambiguous.  Containers such as
rdf:Bag, rdf:Seq and rdf:Alt match the container 
productions 6.25 through 6.31, but also match the
typedNode production (6.13).

The container productions attempt to restrict what the
language can express about containers, but the ambiguity
in the syntax effectively circumvents those restrictions.

It is not clear what parsers should do if they encounter
an rdf:li element when processing productions other than
the container specific productions of the grammar.

Sub-classes (described by rdfs:subClassOf) of the 
container classes do not match the container specific
productions in the formal grammar.  M&S states that these
productions should be extended to included stuctures that
are rdfs:subClassOf of rdfs:Container.  Processing this 
requires parsers to process the class structure resources
in a document, which some do not do.  It also requires that
the class structure must be included in any XML 
serialization.

Some of these issues have been raised previously and are 
recorded in the RDF issues list [3] [4].  

We recognise that other issues about containers have been
raised whose resolution require changes to the 
specification.  We consider changes to the specification 
to be beyond the remit of this document, and thus we do
not address them here.

2. What Do the Specs Say and Not Say?
   ==================================

 o M&S permits the expression of arbitrary structures
   involving containers, though not always conveniently
   using the container productions.

 o M&S says that the rdf:li mechanism is a convenience
   element [5] to make it easier to write lists of elements
   without individually numbering them.  It does NOT
   say these only work when processing specific
   container productions.

 o M&S says that container members have properties
   starting at rdf:_1 and running contiguously through to
   rdf:_n where n is the number of elements in the container.
   Whilst this is true of the abstract container, M&S
   does NOT say that an implementation cannot represent
   a partial model of a container, and thus might not
   contain all the properties.

  o M&S and RDF Schema do NOT say that the ordinal
    properties rdf:_1, rdf:_2, ... can only be applied to
    containers.

3. Approach of this Proposal
   =========================

The proposal, in essence, is: 

  o Containers and their sub-classes match the typedNode
    production (6.13).  

  o Parsers MUST transform rdf:li into an instance of an
    ordinal property (rdf:_1,rdf: _2 ...) wherever it 
    is used in the formal grammar production propName
   (6.14).

  o Ordinal properties may be attached to any resource

This proposal does not change the expressive power of the
language.  Anything that can be expressed with this
proposal could have previously been expressed.  Anything
that could previously have been expressed can be
expressed under this proposal.

We believe that this proposal merely relaxes some
constraints the some parsers have imposed.  Thus the
triples generated by existing parsers from existing
XML serializations of RDF are unlikely to change.

To conform to this proposal, some parsers will have to
to change.  We believe that such changes are 
simplifications of the parser, as the grammar and 
processing become more regular.

This proposal does not conform to the original intent
of the authors of the m&s specification.  We believe
however, that this proposal is appropriate given 
implementation experience and new understanding not
available to the original authors.

4. Proposal
   ========

 1) Parsers MAY NOT implement the specific productions
    6.25-6.31.  This has no effect on the language as
    anything that matches these productions also matches
    other productions in the grammar.

 2) rdf:li is legal wherever a propName (6.14)
    production can be used.  The rdf:li is transformed
    into an ordinal (rdf:_n rdf:Property) when it is used,
    according to the rules in 3 below.

 3) rdf:li processing

    This description of rdf:li processing is described in
    terms of an implementation.  Parsers are not required
    to implement it this way, but however they implement
    it, the effect should be the same as if it had been
    implemented as described here.

    rdf:li, when it is encountered in the propName (6.14)
    production, is transformed to an ordinal property, i.e.
    one of rdf:_1, rdf:_2 etc.

    It is transformed to the successor of the last ordinal
    property encountered within the current element.  If
    this is the first ordinal property encountered within 
    the current element, then it is transformed to rdf:_1.
    The successor of an ordinal property rdf:_n is rdf:_m
    where m = n+1.
   
    Attributes of an element MUST be processed before
    sub-elements of the element.  Sub-elements are processed
    in the order they appear in the document.

    The rdf:li processing of sub-elements is independent
    of the processing of enclosing elements.  The selection
    of an ordinal to replace an rdf:li is not affected by
    any ordinals encountered in sub-elements of the element.
    The selection of an ordinal to replace an rdf:li is
    not affected by ordinals encountered in enclosing
    elements.

    Note that XML states that the ordering of attributes is
    not significant and that the same attribute name cannot
    appear more than once on an element.  It is probably
    unwise to use rdf:li as an attribute.  If it is used in
    presence of other ordinal property attributes, the ordinal
    property with which it will be replaced is undefined.

 4) rdf:aboutEach processing
    ------------------------

    The rdf:aboutEach attribute defines a distribive
    referent, as described in section 3.3 of [1].

    The rdf:aboutEach referent distributes over 
    all resources R for which is there is a representation
    of a triple in the XML serialization being processed
    of the form:
        [P, rdf:_n, R] 
      where
        P is the resource identified by the value of the
          rdf:aboutEach attribute
        rdf:_n is an ordinal property.

5. Examples
   ========

   Example 1:

      <rdf:Description rdf:about="http://foo" rdf:li="1">
        <rdf:li>2</rdf:li>
      </rdf:Description>

   would generate the triples (in subject, predicate, object order):

      [http://foo, rdf:_1, "1"]
      [http://foo, rdf:_2, "2"]

   Example 2:

      <rdf:Description rdf:about "http://foo">
        <rdf:li>1</rdf:li>
        <rdf:_10>10</rdf:_10>
        <rdf:li>11</rdf:li>
      </rdf:Description>

    would generate the triples:

      [http://foo, rdf:_1, "1"]
      [http://foo, rdf:_10, "10"]
      [http://foo, rdf:_11, "11"]

   Example 3:

      <rdf:Description rdf:about "http://foo">
        <rdf:li>1</rdf:li>
        <rdf:_1>1 again</rdf:li>
      </rdf:Description>

    would generate:

      [http://foo, rdf:_1, "1"]
      [http://foo, rdf:_1, "1 again"]

  Example 4:

     <rdf:Description rdf:about="http://badExample" rdf:li="a" rdf:_3="b"/>

     will generate:
 
        [http://badExample, rdf:_n, "a"]
        [http://badExample, rdf:_3, "b"]

      where n is some integer greater than 0.

  Example 5:

      <rdf:Bag rdf:about="http://foo">
        <rdf:li>1</rdf:li>
          <foo:bar>
            <rdf:Seq rdf:about="http://bar">
              <rdf:li>1</rdf:li>
              <rdf:li>2</rdf:li>
            </rdf:Seq>
          </foo:bar>
        <rdf:li>2</rdf:li>
      </rdf:Bag>

      will generate:

        [http://foo, rdf:_1, "1"]
        [http://foo, rdf:_2, "2"]
        [http://foo, foo:bar, http://bar]
        [http://bar, rdf:_1, "1"]
        [http://bar, rdf:_2, "2"]
        [http://foo, rdf:type, rdf:Seq]
        [http://bar, rdf:type, rdf:Seq]

6. Unresolved Issues
   =================

Issue #1

rdf:Seq, rdf:Bag and rdf:Alt optionally take rdf:ID 
attributes, e.g. sequence (6.25):

  sequence ::= '<rdf:Seq' idAttr? '>' member* '</rdf:Seq>' | 
               '<rdf:Seq' idAttr? memberAttr* '/>' 

whereas typedNode (6.13) elements can take further attributes -
rdf:about, rdf:aboutEach, rdf:aboutEachPrefix and and rdf:bagID

Original rule:
  typedNode ::= '<' typeName idAboutAttr? bagIdAttr? propAttr* '/>' |
                '<' typeName idAboutAttr? bagIdAttr? propAttr* '>'
propertyElt* '</' typeName '>' 

so we need to think about this, or at least describe it further.  Is
this one of those cases where we don't know why this was originally
restricted?

7. References
   ==========

[1] http://www.w3.org/TR/REC-rdf-syntax/
[2] http://www.w3.org/TR/2000/CR-rdf-schema-20000327/
[3] http://www.w3.org/2000/03/rdf-tracking/#rdf-containers-syntax-vs-schema
[4] http://www.w3.org/2000/03/rdf-tracking/#rdf-containers-otherapproaches
[5] http://www.w3.org/TR/REC-rdf-syntax/#containers

Received on Wednesday, 13 December 2000 07:03:48 UTC