- From: Steve Harris <steve.harris@garlik.com>
- Date: Sun, 16 Oct 2011 19:13:25 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
On 15 Oct 2011, at 23:35, Sandro Hawke wrote: > On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote: >> >> On 15/10/11 19:09, Jeremy Carroll wrote: >>> >>> I think both the Seq and the List constructs present technical issues. >>> Basically it is because both present the possibility of 'bad' data and >>> no clarity about what one should do in the face of it. >> >> +1 >> >>> We can easily form ill-formed lists with rdf:first or rdf:rest either >>> missing or multiple. >>> We can easily form ill-formed sequences with duplicate or missing rdf:_2 >> >> although Seq are very fragile and lists are merely fragile. The >> duplicate rdf:_2 by merging is really nasty. >> >>> The consumer of such ill-formed data is in a bind >>> And what's worse is that formally the ill-formed data is not ill-formed, >>> it is just triples. >>> >>> We could label both with a health warning ... >> >> Sandro said that: >>> I think Turtle makes RDF Collections seem quite >>> nice, and hopefully that will quickly set the tone (perhaps with a >>> little help from us) for APIs and SPARQL 1.2 (?) having nice list >>> handling functions that are as efficient as native (non-destructive) >>> list handling functions. (I hope some APIs do this already.) >> >> and the point about Turtle syntax, and the convenience of writing, is >> important. >> >> Jena has container and collection APIs to make working with containers, >> collections easier but the details leak out if you can take a triple view. > > Thinking about it today, I wonder if we can define a "Simple List" (or > "Proper List", or something) as a list that can be losslessly > transmitted via turtle's (...) syntax. That means its structure is > b-nodes, with no extra arcs, etc. (Interestingly, it also means you > can't include rdf:type rdf:List arcs....) Then, we encourage tools to > read/write simple lists, and to work with them efficiently. To be honest, I think the ref:Collection structure is just a dead duck. It works OK for lists on the order of 10 items, but is pretty impractical for thousands, or tens of thousands. I would be mildly opposed to anything which promoted it's use further. > Simple lists have the advantage over Seq, I believe, that in the face of > truth-preserving RDF operations (subsetting, merging, various sound > inferences), they never produce wrong data. In the worst case, they no > longer provide data -- the simple list is mangled in some way -- but > it'll never just tell you the wrong thing. I think this is a big win. True, but detecting if a Collection is mangled is computationally expensive. - Steve >> Ivan: >>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-( >> >> SPARQL 1.2 will not solve anything I'm afraid. SPARQL 1.1 Query has >> gone as far as it can, except maybe a little extra syntactic sugar with >> >> { ?list rdf:rest*/rdf:first ?member } >> >> It's much better than handling Seqs. > > I'm trying to brainstorm ways to shoe-horn list handling into SPARQL. I > don't know if there's any elegant way, but maybe there's a hack that's > not too bad. > > One approach is to update the results format to allow lists of values > where it currently allows single values. And then offer some way to > signal you want Simple Lists to be returned as list values instead of > b-nodes. One way to do that would be a LISTRESULT function that takes > a simple list's starting bnode and returns something that the results > format serializes as a list. Essentially, it's just a way of saying you > want a list result here. > > So... > > SELECT ?x ?y LISTRESULT(?z) WHERE... > > would require ?z to be bound to a simple list and would pack the list > elements into the result format in a manner specific to that results > format (XML, JSON, etc). > > Other, normal list builtins like SUBLIST(list, startpos, stoppos) could > be used to make sure the size of the returned list is manageable. > > Another approach, perhaps, would be some kind of dis-aggregator, a pair > of builtins that work together to make a list appear like many different > results: > > Data: > eg:Alice eg:likes ( eg:Bob eg:Charlie ). > > Query: > SELECT ?who ITERINDEX(?list) ITERVALUE(?list) > WHERE ?who eg:likes ?list. > > would return results: > eg:Alice 0 eg:Bob > eg:Alice 1 eg:Charlie > > although not necessarily in that order. > > That's pretty messy to spec and implement, but might be pretty nice to > use. > > >> SPARQL Update can manuipuate lists but it's ugly: >> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html >> >> >> The fundamental problem in SPARQL is that any order is lost; so this >> list access works for some cases, where the order does not matter. >> >> Even if a special order preserving construct were available, order is >> lost in the rest of the query. An order-presering QL would not be >> SPARQL 1.2 - it would be have completely different basis, (e.g. no >> chance of implementing use hash joins), would be very hard to have >> parallel implementation (see "big data" graph languages), and still does >> not work when two ordered different subresults need combining. >> >> Fundamentally, there are two problems: >> >> 1/ Encoding in triples >> 2/ Lists aren't the only datastructure. >> >> Reification, containers and collections encode data structure in triples >> but if the app can see "triples" then this leaks through to the >> application. It also means there can be the possibility of 'bad' data >> as Jeremy says. Seeing the triples is confusing at best. > > Yes, seeing the triples is a problem, but I'm hoping it's not that bad, > and that mostly people pull what they want out of a graph and ignore the > rest. > >> The structure we have may not say what you want: >> List(1 2 3) != List(3 2 1) >> but if a list is being used to express an unordered collection, a higher >> level convention has to be communicated. >> >> I think the only complete solution will involve putting structural >> literals into RDF itself, so they are not triple-encoded and can't be >> 'bad'. When treated as first-class literals with equality rules, >> accessors, and combining rules, then implementations can store them >> specially, provide good APIs, and application programmer won't have to >> learn about the encoding rules. > > That sounds pretty hard. Do you have some design in mind...? > > - Sandro > > > >
Received on Sunday, 16 October 2011 18:13:49 UTC