practical problems with rdf:parseType="Collection" implementation from Garret Wilson on 2003-08-13 (www-rdf-interest@w3.org from August 2003)

From: Garret Wilson <garret@globalmentor.com>
Date: Tue, 12 Aug 2003 17:09:12 -0700
To: www-rdf-interest@w3.org
Message-ID: <3F3981A8.5050805@globalmentor.com>
Everyone,

I'm just getting around to implementing support for 
rdf:parseType="Collection", even though I've been using it for a while 
in specifications I've written.

The design of rdf:List looks good in theory, but there are a few details 
that make it a pain to implement---particularly, the way rdf#nil is used:

1. Empty lists cannot be modified.

In the RDF world, it's natural to think that one can always add things 
to a resource. I can specify another dc:creator to a book. I can later 
add an rdfs:label to a resource. Yes, I know that RDF is designed to 
only be aware of one set of static relationships, but in the real world 
we need to modify resources at some time or another. We can modify 
rdf:Alt, rdf:Bag, and rdf:Seq, even if they are empty---we just add one 
or more resources to the collection.

rdf:List is designed so that one cannot add anything to an empty 
rdf:List, because there is only one empty rdf:List, the one named 
rdf#nil. There are an infinite number of non-empty lists, with an 
infinite number of (usually anonymous/blank node) reference URIs, but 
*none* of those lists can be empty, because the empty list has its own 
static universal reference URI. Conversely, the empty list can never be 
added to without changing its reference URI. Put differently, an empty 
rdf:List cannot be filled---it can only be replaced with a filled list. 
(Compare this to an rdf:List with one element---programmatically, one 
can add more elements by changing the rdf:rest property, allowing the 
list remains the same entity, identified by the same reference URI. This 
is impossible with an empty list.)

2. It is a pain to populate an rdf:List.

Even if we accept the theoretical notion of RDF as a static snapshot of 
relationships, in the real world one has to populate that directed graph 
programmatically---when parsing an RDF+XML document, for instance. With 
the old containers, that was easy: we start with an empty rdf:Alt, 
rdf:Bag, or rdf:Seq and then add elements if and only if they are present.

With rdf:List, this procedure remains the same *only* after we know we 
have one element in a list. Until we have one element in the list, we 
don't know whether to create an anonymous rdf:List and populate it with 
items, or (if there are no items) to create an rdf#nil list (with its 
unique reference URI). This results in very inelegant algorithms:

while(there are child elements)
{
   create a new rdf:List
   if(we've already created an rdf:List)
   {
     add the new rdf:List to the old one
   }
   else
   {
     specify that the new list is the "root" list
     save this new list for next time
   }
}
if(we have record of finding a "root" list)
   use the "root" list as the property value

In contrast, the old containers allowed very elegant implementations, 
because they didn't distinguish conceptually between empty and filled 
containers:

create new container
while(there are child elements)
{
   add the element to the container
}
use the new container as the property value

3. It is impossible to independently insert an element at the beginning 
of an rdf:List.

In object-oriented programming, I'd like to have an object represent an 
rdf:List. In Java, something implementing java.util.List would be great. 
Given any MyList, it's a simple matter to insert something at index i+1 
with i>0: I just create a new rdf:List with rdf:first representing the 
inserted resource, change rdf:List(i).rdf:rest to point to the new 
rdf:List, and change rdf:List(i+1).rdf:rest to the old value of 
rdf:List(i).rdf:rest.

That's all fine except when i=0. To insert at the front of the list, I 
have to know for which resource property the rdf:List is the property 
value. This is made worse by the fact that several properties (of 
several resources) might have the rdf:List as a property value. This 
leads to the following inconsistency: if resources example.com#book1, 
example.com#book2, and example.com#book3 all have a property of 
listOfComments, I can always add another comment to the end of the list 
without modifying the property value for any of the books, but if I want 
to *insert* a comment at the first of the list, I have to modify the 
property value for each of the books.

In very practical terms, that means if I have the function...

add(RDFList list, RDFResource resourceElement, int index)

...it will work for all values of index except index==0, unless I have 
access to the entire RDF data model, walk the graph, and find all 
resources which have properties for which the list is a property value.

Similarly, going back to problem #1 (above), the function...

add(RDFList list, RDFResource resourceElement, int index)

...cannot work with empty lists!

I understand that the old collection framework had shortcomings (those 
silly indexes, for one thing) and that the new rdf:List framework looks 
nice in a pretty static graph on paper. The specific way that rdf#nil is 
used as an empty list, however, creates very inelegant impelementation 
restrictions. Surely rdf:List could me modified to be better than the 
old collections, yet also usable in real life.

Garret
Received on Tuesday, 12 August 2003 20:09:21 UTC