Reification of Sets (of RDF Statement, for Queries) from Sandro Hawke on 2001-04-09 (www-rdf-interest@w3.org from April 2001)

From: Sandro Hawke <sandro@w3.org>
Date: Mon, 09 Apr 2001 19:24:59 -0400
To: www-rdf-interest@w3.org
Message-Id: <200104100044.f3A0iJ701421@daniel.hawke.org>
Issue 1: RDF M&S Does Not Provide Sets

The argument against them I've heard is "we don't have an enforcement
mechanism" (for duplicates, so use bags) or "we have to provide them
in some order" (so use lists).   

I think those arguments against defining a vocabulary for
communicating information about set membership are, to put it mildly,
weak.

On the first point, you don't need to provide an enforcement
mechanism.  If someone says "X contains 3" and then "X contains 3"
again, well, you know "X contains 3".  No problem.

On the second: it doesn't matter if you have extraneous data.  In set
theory, people say "x={3,4}" and they know it's the same as "x={4,3}".
Yes, syntacticly the elements appear as a list, but the set whose
elements were enumerated by the list is the name no matter what order
of enumeration is used.  The extraneous ordering information is simply
ignored.



Issue 2: Completeness of Knowledge  ("Closed" collections)

There is a significant difference between "The set X contains the
numbers 3 and 4" and "The set X contains *only* the numbers 3 and 4."
Given only the information in the first form, you cannot answer
whether 5 is in X.  

RDF collections at present only provide incomplete knowledge, so
people have to frame their queries as being about a different set.
"Is 5 in X?" cannot ever be answered negatively, so you have to ask
"Is 5 in the set of things you currently know to be in X?"  I think
this is broken.

This is different from the rdfms-seq-representation issue, althought
the LISP-style list solution solves both problems.



More generally, here are some solutions:

For List Completeness:

   1)  Use predicates _1, _2, _3, ... and also a max_index predicate

       - needs integer sequence generation
       - needs integer comparison

   2)  Use predicates _1, _2, _3, ... with a special object "-99" 
       which marks the end

       - needs integer sequence generation
       - this is odd; that flag cannot be in any list

   3) Use predicates _1, _2, _3, ... with a special arbitrary end
      marking object related to the list by a list_end_flag
      predicate (ie, you pick the end object on a per-list basis).

       - needs integer sequence generation
       - a little complicated

   4) LISP-style lists: predicates First and Rest, object TheEmptyList.

       - a little complicated

   5) ...?  anything else?

It's tempting to think in terms of the syntax where it looks like the
list is complete:

<list id=foo>
  <li>a
  <li>b
</list>

but in the abstract systax (think of the graph) that closing
information is lost as the RDF parsers seem to handle it.  (As is the
ordering, if you don't turn the "li" predicate into "_1"...)

For making Sets from Lists:

   1)  an Emumeration predicate, relating a set to a list which
       contains all the same elements at least once

   2)  an ElementSet predicate, the inverse of Enumeration

   3)  ...?  anything else?

To bring this together in an example, I'm trying to represent RDF
queries in RDF (ie to reify them).  I think the right approach looks
in n3 like:
   :myQuery q:statements { ... bunch of statements ... };
            q:variables ( ... list of terms in the statements 
	                      which are variables ...).
but the conversion of the "bunch of statements" and "list of terms"
into proper RDF Sentences is subject to a resolution to the issues
raised here.  My current vote is for (4) lisp-lists and (1) a set
enumeration predicate.   If anyone has any objection to these, I'd be
interested in hearing it.

   -- sandro
Received on Monday, 9 April 2001 19:25:04 UTC