Clarifications needed for the Collection construct

 Dear all, 
I see certain things that need to be clarified related to the collection construct.

Sorry for the long mail but I wanted to point it out in detail.

Thanks for your feedback!



In the syntax specification form 8th November 2002 [1] collections where introduced into the RDF syntax. To create a collection the following new terms are included to the RDF namespace: rdf:parseType="Collection", rdf:nil, rdf:rest, rdf:first and rdf:List. The collection itself, when generated with the rdf:parseType="Collection" attribute-value pair, is constructed with blank nodes of the type rdf:List, which is a rdfs:Class. The blank nodes always have a link to the current element of the list connected by the property rdf:first, and a link to the rest of the list connected by the property rdf:rest. The end of the list is denoted by rdf:nil which is an instance of the class rdf:List, so, rdf:nil itself is a list.
The default way of generating a collection in RDF is to use the attribute-value pair rdf:parseType="Collection". But someone could write his own constructs. As you can read in [2] (chapter 3.2.3) there are currently no constraints on collections. Multiple or none rdf:rest or rdf:first definitions are allowed, which means the following set of triples would also be valid:

genID:1  rdf:type  rdf:List .
genID:1  rdf:first  ex:aaa .
genID:1  rdf:first  ex:bbb .
genID:1  rdf:rest  ex:ccc .
genID:1  rdf:rest  genID:2 .
genID:2  rdf:type  rdf:List .
genID:1  rdf:rest  rdf:nil .

The question that arises, does it make any sense? What would it mean to have a collection element with different values? Would it not make more sense to enter a rdf:Bag instead? But there is also another question: Do we need the collection construct at all? Before there had been three kinds of containers, rdf:Bag, rdf:Seq and rdf:Alt. 

There are some differences between containers and a collection. A container in RDF is one resource containing all its members. The collection is different, there are many resources linked with each other. These resources are linked with their value(s) and the end of the collection is denoted by the empty list as the object for the rdf:rest property. Now here comes the main aim of this new construct: It defines a fixed finite list of items with a given length and terminated by rdf:nil, at least this is what we can read in [4] section 4.2.

Reaching the goal? There is no restriction on the structure of lists in RDF. As shown there can be more than one rdf:rest, more than one rdf:first and even the existence of rdf:nil as the terminating object is nowhere forced. By default the collection is constructed with blank nodes but even this can be changed. 

Example 3: A collection with non-blank node.


--------------------------------------------------------------------------------


<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/stuff/1.0/">
  <rdf:Description rdf:about="http://example.org/basket">
    <ex:hasFruit rdf:resource="myCollection">
      <rdf:Description rdf:about="http://example.org/apple"/>
      <rdf:Description rdf:about="http://example.org/pear"/>
    </ex:hasFruit>
  <rdf:List rdf:ID="myCollection">
        <rdf:first rdf:about="http://example.org/apple"/>
        <rdf:rest rdf:parseType="Collection">
           <rdf:Description rdf:about="http://example.org/pear"/>
        </rdf:rest>
  </rdf:List>
  </rdf:Description>
</rdf:RDF>
This example should generate the following triples:

http://example.org/basket  ex:hasFruit  ns1:myCollection .

ns1:myCollection  rdf:type  rdf:List .

ns1:myCollection  rdf:first  http://example.org/apple .

ns1:myCollection  rdf:rest  genID:1 .

genID:1  rdf:type  rdf:List .

genID:1  rdf:first  http://example.org/pear .

genID:1  rdf:rest  rdf:nil .

The effect is that by entering a non-blank node someone could enter also to the collection construct elements from outside. This means without any restrictions this construct is not fixed!

What about other relevant RDF constructs? In [4] the following is stated: A limitation of the containers is that there is no way to close them, i.e., to say, "these are all the members of the container". This is because, while one graph may describe some of the members, there is no way to exclude the possibility that there is another graph somewhere that describes additional members. 

But we can also use blank nodes to identify the rdf:Bag itself. Blank nodes can not be referred from outside and therefore no further member can be added. It even needs less triples and the graph is more easy to read. The example of the fruit basket could be written as:

Example 4: The fruit basket using the bag construct.


--------------------------------------------------------------------------------


<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/stuff/1.0/">
  <rdf:Description rdf:about="http://example.org/basket">
    <ex:hasFruit>
         <rdf:Bag>
            <rdf:li rdf:resource="http://example.org/apple"/>
            <rdf:li rdf:resource="http://example.org/pear"/>
         </rdf:Bag>
    </ex:hasFruit>
  </rdf:Description>
</rdf:RDF>
http://example.org/basket  ex:hasFruit  genID:1 .

genID:1  rdf:type  rdf:Bag .

genID:1  rdf:_1  http://example.org/apple .

genID:1  rdf:_2  http://example.org/pear .

Without restrictions on the collection construct it is just a more complex way of expressing things we already could express before using containers. Possible restrictions can be:

  a.. Each collection in RDF must have exactly one terminating rdf:nil element.
  b.. Each collection element must have exactly one connection with the rdf:first property.
  c.. Each collection element must have exactly one connection with the rdf:rest property.
  d.. Collection elements in RDF have to be blank nodes.
It might be too restrictive to have all these restrictions and there also might be further reasons for introducing the collection construct. 

The main difference at the moment is that a container is one resource containing all values, while the collection contains different linked resources containing the values. In [1] we can find in the appendix A.3 that the collection construct was also introduced to support recursive processing in languages such as Prolog. There should not be a special construct for each programming language. 

Additional question:

What would be the fixed length of a collection? (Number of nodes of type rdf:List that are linked (minus rdf:nil nodes), the number of rdf:first connections? What about multi sets in collections?)

 

Reference 

[1]       RDF/XML Syntax Specification (Revises) Nov. 8th 2002, online at: 
            http://www.w3.org/TR/2002/WD-rdf-syntax-grammar-20021108

[2]       RDF Semantics, W3C Working Draft 23 January 2003, online at: http://www.w3.org/TR/2003/WD-rdf-mt-20030123/

[3]       RDF Vocabulary Description Language 1.0: RDF Schema, W3C Working Draft 12  November 2002, online at: http://www.w3.org/TR/2002/WD-rdf-schema-20021112/

[4]     RDF Primer, W3C Working Draft 23 January 2003, online at: http://www.w3.org/TR/2003/WD-rdf-primer-20030123/ 


Best Greetings,
Karsten Tolle
___________________________________
Karsten Tolle

Received on Thursday, 20 February 2003 08:46:35 UTC