Re: "Simple Lists" (was Re: ISSUE-77: Should we mark rdf:Seq as archaic (cf ISSUE-24)) from Steve Harris on 2011-10-16 (public-rdf-wg@w3.org from October 2011)

From: Steve Harris <steve.harris@garlik.com>
Date: Sun, 16 Oct 2011 19:13:25 +0100
To: Sandro Hawke <sandro@w3.org>
Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
Message-Id: <ADE6BF46-14D2-4FD7-99A8-D1530DD410D6@garlik.com>
On 15 Oct 2011, at 23:35, Sandro Hawke wrote:

> On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote:
>> 
>> On 15/10/11 19:09, Jeremy Carroll wrote:
>>> 
>>> I think both the Seq and the List constructs present technical issues.
>>> Basically it is because both present the possibility of 'bad' data and
>>> no clarity about what one should do in the face of it.
>> 
>> +1
>> 
>>> We can easily form ill-formed lists with rdf:first or rdf:rest either
>>> missing or multiple.
>>> We can easily form ill-formed sequences with duplicate or missing rdf:_2
>> 
>> although Seq are very fragile and lists are merely fragile.  The 
>> duplicate rdf:_2 by merging is really nasty.
>> 
>>> The consumer of such ill-formed data is in a bind
>>> And what's worse is that formally the ill-formed data is not ill-formed,
>>> it is just triples.
>>> 
>>> We could label both with a health warning ...
>> 
>> Sandro said that:
>>> I think Turtle makes RDF Collections seem quite
>>> nice, and hopefully that will quickly set the tone (perhaps with a
>>> little help from us) for APIs and SPARQL 1.2 (?) having nice list
>>> handling functions that are as efficient as native (non-destructive)
>>> list handling functions. (I hope some APIs do this already.)
>> 
>> and the point about Turtle syntax, and the convenience of writing, is 
>> important.
>> 
>> Jena has container and collection APIs to make working with containers, 
>> collections easier but the details leak out if you can take a triple view.
> 
> Thinking about it today, I wonder if we can define a "Simple List" (or
> "Proper List", or something) as a list that can be losslessly
> transmitted via turtle's (...) syntax.  That means its structure is
> b-nodes, with no extra arcs, etc.   (Interestingly, it also means you
> can't include rdf:type rdf:List arcs....)   Then, we encourage tools to
> read/write simple lists, and to work with them efficiently.

To be honest, I think the ref:Collection structure is just a dead duck.

It works OK for lists on the order of 10 items, but is pretty impractical for thousands, or tens of thousands.

I would be mildly opposed to anything which promoted it's use further.

> Simple lists have the advantage over Seq, I believe, that in the face of
> truth-preserving RDF operations (subsetting, merging, various sound
> inferences), they never produce wrong data.  In the worst case, they no
> longer provide data -- the simple list is mangled in some way -- but
> it'll never just tell you the wrong thing.   I think this is a big win.

True, but detecting if a Collection is mangled is computationally expensive.

- Steve

>> Ivan:
>>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-(
>> 
>> SPARQL 1.2 will not solve anything I'm afraid.  SPARQL 1.1 Query has 
>> gone as far as it can, except maybe a little extra syntactic sugar with
>> 
>> { ?list rdf:rest*/rdf:first ?member }
>> 
>> It's much better than handling Seqs.
> 
> I'm trying to brainstorm ways to shoe-horn list handling into SPARQL.  I
> don't know if there's any elegant way, but maybe there's a hack that's
> not too bad.   
> 
> One approach is to update the results format to allow lists of values
> where it currently allows single values.   And then offer some way to
> signal you want Simple Lists to be returned as list values instead of
> b-nodes.    One way to do that would be a LISTRESULT function that takes
> a simple list's starting bnode and returns something that the results
> format serializes as a list.  Essentially, it's just a way of saying you
> want a list result here.
> 
> So...
> 
>  SELECT ?x ?y LISTRESULT(?z) WHERE...
> 
> would require ?z to be bound to a simple list and would pack the list
> elements into the result format in a manner specific to that results
> format (XML, JSON, etc).  
> 
> Other, normal list builtins like SUBLIST(list, startpos, stoppos) could
> be used to make sure the size of the returned list is manageable.
> 
> Another approach, perhaps, would be some kind of dis-aggregator, a pair
> of builtins that work together to make a list appear like many different
> results:
> 
> Data:
>   eg:Alice eg:likes ( eg:Bob eg:Charlie ).   
> 
> Query:
>   SELECT ?who ITERINDEX(?list) ITERVALUE(?list)
>   WHERE ?who eg:likes ?list.
> 
> would return results:
>    eg:Alice 0 eg:Bob
>    eg:Alice 1 eg:Charlie
> 
> although not necessarily in that order.
> 
> That's pretty messy to spec and implement, but might be pretty nice to
> use.
> 
> 
>> SPARQL Update can manuipuate lists but it's ugly:
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html
>> 
>> 
>> The fundamental problem in SPARQL is that any order is lost; so this 
>> list access works for some cases, where the order does not matter.
>> 
>> Even if a special order preserving construct were available, order is 
>> lost in the rest of the query.  An order-presering QL would not be 
>> SPARQL 1.2 - it would be have completely different basis, (e.g. no 
>> chance of implementing use hash joins), would be very hard to have 
>> parallel implementation (see "big data" graph languages), and still does 
>> not work when two ordered different subresults need combining.
>> 
>> Fundamentally, there are two problems:
>> 
>> 1/ Encoding in triples
>> 2/ Lists aren't the only datastructure.
>> 
>> Reification, containers and collections encode data structure in triples 
>> but if the app can see "triples" then this leaks through to the 
>> application.  It also means there can be the possibility of 'bad' data 
>> as Jeremy says.  Seeing the triples is confusing at best.
> 
> Yes, seeing the triples is a problem, but I'm hoping it's not that bad,
> and that mostly people pull what they want out of a graph and ignore the
> rest. 
> 
>> The structure we have may not say what you want:
>>   List(1 2 3) != List(3 2 1)
>> but if a list is being used to express an unordered collection, a higher 
>> level convention has to be communicated.
>> 
>> I think the only complete solution will involve putting structural 
>> literals into RDF itself, so they are not triple-encoded and can't be 
>> 'bad'.  When treated as first-class literals with equality rules, 
>> accessors, and combining rules, then implementations can store them 
>> specially, provide good APIs, and application programmer won't have to 
>> learn about the encoding rules.
> 
> That sounds pretty hard.  Do you have some design in mind...?
> 
>     - Sandro
> 
> 
> 
>
Received on Sunday, 16 October 2011 18:13:49 UTC