Clarifications needed for the Collection construct (with CR) from Karsten Tolle on 2003-02-21 (www-rdf-comments@w3.org from January to March 2003)

From: Karsten Tolle <tolle@dbis.informatik.uni-frankfurt.de>
Date: Fri, 21 Feb 2003 15:49:14 +0100
To: <www-rdf-comments@w3.org>
Message-ID: <002a01c2d9b8$68e08670$6368028d@HANNOVER>
Resending my mail, since CR got lost in the previous mail:-(

Dear all,
I see certain things that need to be clarified related to the collection
construct.
Sorry for the long mail but I wanted to point it out in detail.
Thanks for your feedback!

In the syntax specification form 8th November 2002 [1] collections where
 introduced into the RDF syntax. To create a collection the following new
terms are included to the RDF namespace: rdf:parseType="Collection",
rdf:nil, rdf:rest, rdf:first and rdf:List. The collection itself, when
generated
with the rdf:parseType="Collection" attribute-value pair, is constructed
with
blank nodes of the type rdf:List, which is a rdfs:Class. The blank nodes
always have a link to the current element of the list connected by the
property rdf:first, and a link to the rest of the list connected by the
property
rdf:rest. The end of the list is denoted by rdf:nil which is an instance of
the
class rdf:List, so, rdf:nil itself is a list.

The default way of generating a collection in RDF is to use the attribute-
value pair rdf:parseType="Collection". But someone could write his own
constructs. As you can read in [2] (chapter 3.2.3) there are currently no
constraints on collections. Multiple or none rdf:rest or rdf:first
definitions are
allowed, which means the following set of triples would also be valid:

genID:1  rdf:type  rdf:List .
genID:1  rdf:first  ex:aaa .
genID:1  rdf:first  ex:bbb .
genID:1  rdf:rest  ex:ccc .
genID:1  rdf:rest  genID:2 .
genID:2  rdf:type  rdf:List .
genID:1  rdf:rest  rdf:nil .

The question that arises, does it make any sense? What would it mean to
have a collection element with different values? Would it not make more
sense to enter a rdf:Bag instead? But there is also another question: Do we
need the collection construct at all? Before there had been three kinds of
containers, rdf:Bag, rdf:Seq and rdf:Alt.
There are some differences between containers and a collection. A
container in RDF is one resource containing all its members. The collection
is different, there are many resources linked with each other. These
resources are linked with their value(s) and the end of the collection is
denoted by the empty list as the object for the rdf:rest property. Now here
comes the main aim of this new construct: It defines a fixed finite list of
items with a given length and terminated by rdf:nil, at least this is what
we
can read in [4] section 4.2.
Reaching the goal? There is no restriction on the structure of lists in RDF.
As shown there can be more than one rdf:rest, more than one rdf:first and
even the existence of rdf:nil as the terminating object is nowhere forced.
By default the collection is constructed with blank nodes but even this can
be changed.

Example: A collection with non-blank node.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/stuff/1.0/">
  <rdf:Description rdf:about="http://example.org/basket">
    <ex:hasFruit rdf:resource="myCollection">
      <rdf:Description rdf:about="http://example.org/apple"/>
      <rdf:Description rdf:about="http://example.org/pear"/>
    </ex:hasFruit>
  <rdf:List rdf:ID="myCollection">
        <rdf:first rdf:about="http://example.org/apple"/>
        <rdf:rest rdf:parseType="Collection">
           <rdf:Description rdf:about="http://example.org/pear"/>
        </rdf:rest>
  </rdf:List>
  </rdf:Description>
</rdf:RDF>

This example should generate the following triples:

http://example.org/basket  ex:hasFruit  ns1:myCollection .
ns1:myCollection  rdf:type  rdf:List .
ns1:myCollection  rdf:first  http://example.org/apple .
ns1:myCollection  rdf:rest  genID:1 .
genID:1  rdf:type  rdf:List .
genID:1  rdf:first  http://example.org/pear .
genID:1  rdf:rest  rdf:nil .

The effect is that by entering a non-blank node someone could enter also
to the collection construct elements from outside. This means without
any restrictions this construct is not fixed!
What about other relevant RDF constructs? In [4] the following is stated:
A limitation of the containers is that there is no way to close them, i.e.,
to
say, "these are all the members of the container". This is because, while
one graph may describe some of the members, there is no way to exclude
the possibility that there is another graph somewhere that describes
additional members.
But we can also use blank nodes to identify the rdf:Bag itself. Blank nodes
#can not be referred from outside and therefore no further member can be
added. It even needs less triples and the graph is more easy to read. The
example of the fruit basket could be written as:

Example: The fruit basket using the bag construct.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
         xmlns:ex="http://example.org/stuff/1.0/">
  <rdf:Description rdf:about="http://example.org/basket">
    <ex:hasFruit>
         <rdf:Bag>
            <rdf:li rdf:resource="http://example.org/apple"/>
            <rdf:li rdf:resource="http://example.org/pear"/>
         </rdf:Bag>
    </ex:hasFruit>
  </rdf:Description>
</rdf:RDF>

http://example.org/basket  ex:hasFruit  genID:1 .
genID:1  rdf:type  rdf:Bag .
genID:1  rdf:_1  http://example.org/apple .
genID:1  rdf:_2  http://example.org/pear .

Without restrictions on the collection construct it is just a more complex
way of expressing things we already could express before using containers.
Possible restrictions can be:
1. Each collection in RDF must have exactly one terminating rdf:nil element.
2. Each collection element must have exactly one connection with the
rdf:first property.
3. Each collection element must have exactly one connection with the
rdf:rest property.
4. Collection elements in RDF have to be blank nodes.

It might be too restrictive to have all these restrictions and there also
might
be further reasons for introducing the collection construct.
The main difference at the moment is that a container is one resource
containing
all values, while the collection contains different linked resources
containing
the values. In [1] we can find in the appendix A.3 that the collection
construct
was also introduced to support recursive processing in languages such as
Prolog. There should not be a special construct for each programming
language.

Additional question:
What would be the fixed length of a collection? (Number of nodes of type
rdf:List that are linked (minus rdf:nil nodes), the number of rdf:first
connections? What about multi sets in collections?)

Best Greetings,
Karsten Tolle

Reference
[1]       RDF/XML Syntax Specification (Revises) Nov. 8th 2002, online at:
            http://www.w3.org/TR/2002/WD-rdf-syntax-grammar-20021108
[2]       RDF Semantics, W3C Working Draft 23 January 2003, online at:
http://www.w3.org/TR/2003/WD-rdf-mt-20030123/
[3]       RDF Vocabulary Description Language 1.0: RDF Schema, W3C Working
Draft 12  November 2002, online at:
http://www.w3.org/TR/2002/WD-rdf-schema-20021112/
[4]     RDF Primer, W3C Working Draft 23 January 2003, online at:
http://www.w3.org/TR/2003/WD-rdf-primer-20030123/


___________________________________
Karsten Tolle
Received on Friday, 21 February 2003 09:47:25 UTC