- From: Steve Harris <steve.harris@garlik.com>
- Date: Sun, 16 Oct 2011 19:13:25 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
On 15 Oct 2011, at 23:35, Sandro Hawke wrote:
> On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote:
>>
>> On 15/10/11 19:09, Jeremy Carroll wrote:
>>>
>>> I think both the Seq and the List constructs present technical issues.
>>> Basically it is because both present the possibility of 'bad' data and
>>> no clarity about what one should do in the face of it.
>>
>> +1
>>
>>> We can easily form ill-formed lists with rdf:first or rdf:rest either
>>> missing or multiple.
>>> We can easily form ill-formed sequences with duplicate or missing rdf:_2
>>
>> although Seq are very fragile and lists are merely fragile. The
>> duplicate rdf:_2 by merging is really nasty.
>>
>>> The consumer of such ill-formed data is in a bind
>>> And what's worse is that formally the ill-formed data is not ill-formed,
>>> it is just triples.
>>>
>>> We could label both with a health warning ...
>>
>> Sandro said that:
>>> I think Turtle makes RDF Collections seem quite
>>> nice, and hopefully that will quickly set the tone (perhaps with a
>>> little help from us) for APIs and SPARQL 1.2 (?) having nice list
>>> handling functions that are as efficient as native (non-destructive)
>>> list handling functions. (I hope some APIs do this already.)
>>
>> and the point about Turtle syntax, and the convenience of writing, is
>> important.
>>
>> Jena has container and collection APIs to make working with containers,
>> collections easier but the details leak out if you can take a triple view.
>
> Thinking about it today, I wonder if we can define a "Simple List" (or
> "Proper List", or something) as a list that can be losslessly
> transmitted via turtle's (...) syntax. That means its structure is
> b-nodes, with no extra arcs, etc. (Interestingly, it also means you
> can't include rdf:type rdf:List arcs....) Then, we encourage tools to
> read/write simple lists, and to work with them efficiently.
To be honest, I think the ref:Collection structure is just a dead duck.
It works OK for lists on the order of 10 items, but is pretty impractical for thousands, or tens of thousands.
I would be mildly opposed to anything which promoted it's use further.
> Simple lists have the advantage over Seq, I believe, that in the face of
> truth-preserving RDF operations (subsetting, merging, various sound
> inferences), they never produce wrong data. In the worst case, they no
> longer provide data -- the simple list is mangled in some way -- but
> it'll never just tell you the wrong thing. I think this is a big win.
True, but detecting if a Collection is mangled is computationally expensive.
- Steve
>> Ivan:
>>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-(
>>
>> SPARQL 1.2 will not solve anything I'm afraid. SPARQL 1.1 Query has
>> gone as far as it can, except maybe a little extra syntactic sugar with
>>
>> { ?list rdf:rest*/rdf:first ?member }
>>
>> It's much better than handling Seqs.
>
> I'm trying to brainstorm ways to shoe-horn list handling into SPARQL. I
> don't know if there's any elegant way, but maybe there's a hack that's
> not too bad.
>
> One approach is to update the results format to allow lists of values
> where it currently allows single values. And then offer some way to
> signal you want Simple Lists to be returned as list values instead of
> b-nodes. One way to do that would be a LISTRESULT function that takes
> a simple list's starting bnode and returns something that the results
> format serializes as a list. Essentially, it's just a way of saying you
> want a list result here.
>
> So...
>
> SELECT ?x ?y LISTRESULT(?z) WHERE...
>
> would require ?z to be bound to a simple list and would pack the list
> elements into the result format in a manner specific to that results
> format (XML, JSON, etc).
>
> Other, normal list builtins like SUBLIST(list, startpos, stoppos) could
> be used to make sure the size of the returned list is manageable.
>
> Another approach, perhaps, would be some kind of dis-aggregator, a pair
> of builtins that work together to make a list appear like many different
> results:
>
> Data:
> eg:Alice eg:likes ( eg:Bob eg:Charlie ).
>
> Query:
> SELECT ?who ITERINDEX(?list) ITERVALUE(?list)
> WHERE ?who eg:likes ?list.
>
> would return results:
> eg:Alice 0 eg:Bob
> eg:Alice 1 eg:Charlie
>
> although not necessarily in that order.
>
> That's pretty messy to spec and implement, but might be pretty nice to
> use.
>
>
>> SPARQL Update can manuipuate lists but it's ugly:
>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html
>>
>>
>> The fundamental problem in SPARQL is that any order is lost; so this
>> list access works for some cases, where the order does not matter.
>>
>> Even if a special order preserving construct were available, order is
>> lost in the rest of the query. An order-presering QL would not be
>> SPARQL 1.2 - it would be have completely different basis, (e.g. no
>> chance of implementing use hash joins), would be very hard to have
>> parallel implementation (see "big data" graph languages), and still does
>> not work when two ordered different subresults need combining.
>>
>> Fundamentally, there are two problems:
>>
>> 1/ Encoding in triples
>> 2/ Lists aren't the only datastructure.
>>
>> Reification, containers and collections encode data structure in triples
>> but if the app can see "triples" then this leaks through to the
>> application. It also means there can be the possibility of 'bad' data
>> as Jeremy says. Seeing the triples is confusing at best.
>
> Yes, seeing the triples is a problem, but I'm hoping it's not that bad,
> and that mostly people pull what they want out of a graph and ignore the
> rest.
>
>> The structure we have may not say what you want:
>> List(1 2 3) != List(3 2 1)
>> but if a list is being used to express an unordered collection, a higher
>> level convention has to be communicated.
>>
>> I think the only complete solution will involve putting structural
>> literals into RDF itself, so they are not triple-encoded and can't be
>> 'bad'. When treated as first-class literals with equality rules,
>> accessors, and combining rules, then implementations can store them
>> specially, provide good APIs, and application programmer won't have to
>> learn about the encoding rules.
>
> That sounds pretty hard. Do you have some design in mind...?
>
> - Sandro
>
>
>
>
Received on Sunday, 16 October 2011 18:13:49 UTC