Re: Confusion about Collections from Frank Manola on 2002-11-26 (www-rdf-comments@w3.org from October to December 2002)

From: Frank Manola <fmanola@mitre.org>
Date: Mon, 25 Nov 2002 21:37:35 -0500
To: Shelley Powers <shelleyp@burningbird.net>
CC: www-rdf-comments@w3.org
Message-ID: <3DE2DE6F.9070004@mitre.org>
Shelley Powers wrote:

> <snip>
> 
>>>I need some clarification about your clarification.  I understand what
>>>you say about the mapping between the RDF/XML of the collection and the
>>>generated graph (there is one;  it's described in the Syntax
>>>specification, but reading it isn't for the faint of heart), and I'm
>>>concocting some words to try to describe it.  However, I'm not sure I
>>>understand what you mean by the "long form" of the Collection.  It seems
>>>to me that the graph is the "long form" (that is, it shows the consed
>>>list, in all its "glory"), and there's a drawn graph in the Primer.  Are
>>>you saying that a *triples* version of that graph would be clearer, and
>>>would help people more than the drawing (he asked in astonishment)?  If
>>>so, do you mean in addition to or instead of the drawing?
>>>
>>>
>>You're talking about the productions. I wouldn't recommend that for the
>>Primer, or as you say, for the faint of heart or only for those intensely
>>interested.
>>
>>People will programmatically access RDF as triples.
>>
> 
>>But the triples are simply another way of describing the graph or,
>>putting it another way, it's pretty straightforward to construct the
>>triples from the drawing if you want to, but I suspect simply slopping
>>all those triples down on paper isn't going to help very much, since
>>there are no visual cues provided in triples.
>>
> 
> What I'm trying to say is that one criticism of the RDF/XML is that the RDF
> graph isn't clear from the RDF/XML. You show Collection and a listing of
> resources, but the graph shows a collection type, and each resource is
> connected to each other through a rdf:rest, and there's an implicit rdf:nil
> at the end of the list, and then each resource points to a value and then to
> a type and so on. That is what I meant be a disconnect between the graph and
> the RDF/XML.
> 


We may be talking past each other to a certain extent. What the Primer 
presents about collections is an RDF/XML example, together with the 
corresponding graph (drawn as nodes and arcs).  I agree that the 
connection between the graph and the RDF/XML isn't all that 
straightforward, although there is a mapping, which I tried to explain 
in the snippet below (did that make the mapping any clearer?)  Regarding 
triples, the nodes-and-arcs diagrams and groups of triples are both 
representations of the graph, and there's one triple per arc in the 
drawn graph.  This is why I said it would be pretty straightforward to 
create the triples from the nodes-and-arcs diagram.


> 
>>And there is no mapping
>>between the RDF/XML and the triples if I read the graph correctly.
>>
>>
> snip
> 
> 
>>Sure there is.  The property element with the rdf:parseType="Collection"
>>attribute describes the property that points to the head of the list in
>>the graph.  The resource descriptions nested in that property element
>>are members of the collection.  For each one of those member resources,
>>there's a corresponding resource of type rdf:List generated, with
>>rdf:first and rdf:rest properties connecting it to the member and the
>>rest of the list respectively.  To indicate the end of the list, you
>>make rdf:rest property value be rdf:nil.  It may be that the use of
>>s:student nested elements in the RDF/XML is causing the confusion.  I've
>>changed the RDF/XML to:
>>
> 
> <snip>
> 
> (same graph).
> 
> 
> That's a lot to add to the model for parseType=Collection", isn't it?
> 
> Frank, I am trying to point out things in the documents that people who have
> not lived and breathed RDF are going to get confused about. And I'm pretty
> darn sure Collection is going to be one of them.


I understand, and appreciate that.  The problems I'm facing here are 
limited time (there's other stuff that needs work too), and the Primer 
having to serve multiple roles at the same time (there's a concern that 
the longer the Primer gets, the fewer people will want to plow through 
it).  I hear about this a lot!

One of the particular problems with collections is that they are 
intended to provide "data structure" support for closed groups, i.e., 
where you are saying "these members are all there are" (something you 
can't do with containers), and they are really in there because OWL 
needs them (rdf:Collection was originally daml:Collection).  So the 
RDF/XML is simple (it looks a lot like the way you would describe a kind 
of Bag), but the graph it generates is complex (it's a list, because you 
can't, in another RDF graph somewhere else, add additional members to 
the list structure and still preserve the list structure).  The problem 
is that RDF doesn't itself provide the constraint that you need to 
really guarantee that the group is closed, because someone somewhere 
else could describe a completely separate list (in the example, a 
separate list of students).  That is, each list would be closed, due to 
the data structure, but the group of students would not be.  What you 
need is a way of saying that there is only one s:students property with 
courses/6.001 as the subject.  OWL provides such a constraint, and they 
are really the main users.  What I probably need to do is say something 
like this in the Primer;  do you think that would help or make things worse?


> 
> 
>>I think it would help if you displayed the N-Tripes for the graph in
>>addition to the graph. I think you also need to walk through each aspect
>>
> of
> 
>>the graph and explain what it means, and if there is an alternate syntax
>>
> for
> 
>>same (as there is with rdf:Bag).
>>
> 
>>I think this would be devoting more space to Collections than they're
>>probably worth.  I'll see.  But you can certainly construct your own
>>list structures using your own vocabulary if you want to (e.g., two-way
>>lists).
>>
> 
> Again, trying to help point out areas that will generate confusion, and
> concern.


I know.


> 
> 
>>Also -- "Up to applications to interpret it". That's enough to ensure I
>>
> will
> 
>>never use Collections in my vocabularies. What a nightmare. This is the
>>
> same
> 
>>as relying on user agent's interpretation of <P> in the original HTML.
>>
> 
>>This is no different than it being up to applications to interpret
>>rdf:Bag and rdf:_1 though (and I know there's been a separate thread on
>>this aspect of containers and reification, so we'll take it as read).
>>
> 
> There is, and my attitude about all three now is that there is going to be
> confusion and incompatibility about all three, then all three shouldn't be
> used (as others have pointed out). And I'll write accordingly.


I understand.  But part of this problem about the lack of semantics 
associated with containers and reification has always existed, it just 
wasn't always clear.  That is, it wasn't clear how much of the intended 
meaning of, say, an Alt could actually be controlled by RDF, and how 
much had to be based on application writers doing appropriate things. 
RDF never, for example, specified an API that defined operations on 
containers, or had a way of controlling whether an application really 
used the first member of an Alt as the default value.  So this time 
around we're trying to be very clear about what things RDF by itself 
guarantees, and what things are not going to be interoperable unless 
everyone understands and implements the intended structure and behavior 
the same way.  Of course, you can get quite a lot done with these kinds 
of general understandings, and I expect people are successfully using 
containers and reification based on them.  It's just that we're trying 
to make a distinction between what RDF itself can realistically 
guarantee, and additional characteristics of these constructs that have 
to rely on people to "do the right thing".


> 
>>Also, you introduced the term 'consed-pair' in the document to reflect
>>Collections. This does need definition, it's not used anywhere else in any
>>of the documents.
>>
>>
> 
>>What I said was that the RDF graph structure shown for collections was
>>known as a "consed-pair" construction.  That's kind of a definition ("a
>>consed-pair construction is one of these thingies") right?  I think what
>>you mean is I should explain why it's *called* a "consed-pair"
>>construction, but rather than waste the space on that, I'll take the
>>term out.
>>
> 
> 
> Frank, does consed-pair add to the meaning of the Collection? If so, then
> define it and say why you're using it for Collection. Don't necessarily drop
> it. Are you getting charged for the square inch that the primer takes?


It adds to the meaning for those who understand what a consed-pair is. 
But explaining what it is for those who don't understand already would 
be a diversion, I think.  What I meant was, if it's going to create 
confusion, it would be better to take the term out, and directly explain 
what's going on more fully instead.

And since you ask, I sort of *am* being charged for the square inch the 
Primer takes.  I get beat over the head each time a new version of the 
Primer comes out because it's long.  And it's long because the 
explanations get more thorough, and more stuff has to be explained (even 
when I take other stuff out).  What this really needs is a book;  why 
else are you writing one? :-)


> 
> Look as I said earlier, I'm just trying to show you where you all are going
> to have problems with the "mess of humans" that don't think RDF but will be
> trying to use RDF/XML in the future.
> 


I understand that you are, and in most cases I agree that we're going to 
have problems in those places (and in other places too).  But we don't 
have an arbitrary amount of time to wrap this up.

--Frank


 


-- 
Frank Manola                   The MITRE Corporation
202 Burlington Road, MS A345   Bedford, MA 01730-1420
mailto:fmanola@mitre.org       voice: 781-271-8147   FAX: 781-271-875
Received on Monday, 25 November 2002 21:20:31 UTC