"Simple Lists" (was Re: ISSUE-77: Should we mark rdf:Seq as archaic (cf ISSUE-24))

On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote:
> 
> On 15/10/11 19:09, Jeremy Carroll wrote:
> >
> > I think both the Seq and the List constructs present technical issues.
> > Basically it is because both present the possibility of 'bad' data and
> > no clarity about what one should do in the face of it.
> 
> +1
> 
> > We can easily form ill-formed lists with rdf:first or rdf:rest either
> > missing or multiple.
> > We can easily form ill-formed sequences with duplicate or missing rdf:_2
> 
> although Seq are very fragile and lists are merely fragile.  The 
> duplicate rdf:_2 by merging is really nasty.
> 
> > The consumer of such ill-formed data is in a bind
> > And what's worse is that formally the ill-formed data is not ill-formed,
> > it is just triples.
> >
> > We could label both with a health warning ...
> 
> Sandro said that:
> > I think Turtle makes RDF Collections seem quite
> > nice, and hopefully that will quickly set the tone (perhaps with a
> > little help from us) for APIs and SPARQL 1.2 (?) having nice list
> > handling functions that are as efficient as native (non-destructive)
> > list handling functions. (I hope some APIs do this already.)
> 
> and the point about Turtle syntax, and the convenience of writing, is 
> important.
> 
> Jena has container and collection APIs to make working with containers, 
> collections easier but the details leak out if you can take a triple view.

Thinking about it today, I wonder if we can define a "Simple List" (or
"Proper List", or something) as a list that can be losslessly
transmitted via turtle's (...) syntax.  That means its structure is
b-nodes, with no extra arcs, etc.   (Interestingly, it also means you
can't include rdf:type rdf:List arcs....)   Then, we encourage tools to
read/write simple lists, and to work with them efficiently.

Simple lists have the advantage over Seq, I believe, that in the face of
truth-preserving RDF operations (subsetting, merging, various sound
inferences), they never produce wrong data.  In the worst case, they no
longer provide data -- the simple list is mangled in some way -- but
it'll never just tell you the wrong thing.   I think this is a big win.

> Ivan:
> > But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-(
> 
> SPARQL 1.2 will not solve anything I'm afraid.  SPARQL 1.1 Query has 
> gone as far as it can, except maybe a little extra syntactic sugar with
> 
> { ?list rdf:rest*/rdf:first ?member }
> 
> It's much better than handling Seqs.

I'm trying to brainstorm ways to shoe-horn list handling into SPARQL.  I
don't know if there's any elegant way, but maybe there's a hack that's
not too bad.   

One approach is to update the results format to allow lists of values
where it currently allows single values.   And then offer some way to
signal you want Simple Lists to be returned as list values instead of
b-nodes.    One way to do that would be a LISTRESULT function that takes
a simple list's starting bnode and returns something that the results
format serializes as a list.  Essentially, it's just a way of saying you
want a list result here.

So...

  SELECT ?x ?y LISTRESULT(?z) WHERE...

would require ?z to be bound to a simple list and would pack the list
elements into the result format in a manner specific to that results
format (XML, JSON, etc).  

Other, normal list builtins like SUBLIST(list, startpos, stoppos) could
be used to make sure the size of the returned list is manageable.

Another approach, perhaps, would be some kind of dis-aggregator, a pair
of builtins that work together to make a list appear like many different
results:

Data:
   eg:Alice eg:likes ( eg:Bob eg:Charlie ).   

Query:
   SELECT ?who ITERINDEX(?list) ITERVALUE(?list)
   WHERE ?who eg:likes ?list.

would return results:
    eg:Alice 0 eg:Bob
    eg:Alice 1 eg:Charlie

although not necessarily in that order.

That's pretty messy to spec and implement, but might be pretty nice to
use.


> SPARQL Update can manuipuate lists but it's ugly:
> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html
> 
> 
> The fundamental problem in SPARQL is that any order is lost; so this 
> list access works for some cases, where the order does not matter.
> 
> Even if a special order preserving construct were available, order is 
> lost in the rest of the query.  An order-presering QL would not be 
> SPARQL 1.2 - it would be have completely different basis, (e.g. no 
> chance of implementing use hash joins), would be very hard to have 
> parallel implementation (see "big data" graph languages), and still does 
> not work when two ordered different subresults need combining.
> 
> Fundamentally, there are two problems:
> 
> 1/ Encoding in triples
> 2/ Lists aren't the only datastructure.
> 
> Reification, containers and collections encode data structure in triples 
> but if the app can see "triples" then this leaks through to the 
> application.  It also means there can be the possibility of 'bad' data 
> as Jeremy says.  Seeing the triples is confusing at best.

Yes, seeing the triples is a problem, but I'm hoping it's not that bad,
and that mostly people pull what they want out of a graph and ignore the
rest. 

> The structure we have may not say what you want:
>    List(1 2 3) != List(3 2 1)
> but if a list is being used to express an unordered collection, a higher 
> level convention has to be communicated.
> 
> I think the only complete solution will involve putting structural 
> literals into RDF itself, so they are not triple-encoded and can't be 
> 'bad'.  When treated as first-class literals with equality rules, 
> accessors, and combining rules, then implementations can store them 
> specially, provide good APIs, and application programmer won't have to 
> learn about the encoding rules.

That sounds pretty hard.  Do you have some design in mind...?

     - Sandro

Received on Saturday, 15 October 2011 22:35:30 UTC