- From: Steve Harris <steve.harris@garlik.com>
- Date: Sun, 16 Oct 2011 21:28:32 +0100
- To: Sandro Hawke <sandro@w3.org>
- Cc: Andy Seaborne <andy.seaborne@epimorphics.com>, public-rdf-wg@w3.org
On 16 Oct 2011, at 19:26, Sandro Hawke wrote: > On Sun, 2011-10-16 at 19:13 +0100, Steve Harris wrote: >> On 15 Oct 2011, at 23:35, Sandro Hawke wrote: >> >>> On Sat, 2011-10-15 at 19:35 +0100, Andy Seaborne wrote: >>>> >>>> On 15/10/11 19:09, Jeremy Carroll wrote: >>>>> >>>>> I think both the Seq and the List constructs present technical issues. >>>>> Basically it is because both present the possibility of 'bad' data and >>>>> no clarity about what one should do in the face of it. >>>> >>>> +1 >>>> >>>>> We can easily form ill-formed lists with rdf:first or rdf:rest either >>>>> missing or multiple. >>>>> We can easily form ill-formed sequences with duplicate or missing rdf:_2 >>>> >>>> although Seq are very fragile and lists are merely fragile. The >>>> duplicate rdf:_2 by merging is really nasty. >>>> >>>>> The consumer of such ill-formed data is in a bind >>>>> And what's worse is that formally the ill-formed data is not ill-formed, >>>>> it is just triples. >>>>> >>>>> We could label both with a health warning ... >>>> >>>> Sandro said that: >>>>> I think Turtle makes RDF Collections seem quite >>>>> nice, and hopefully that will quickly set the tone (perhaps with a >>>>> little help from us) for APIs and SPARQL 1.2 (?) having nice list >>>>> handling functions that are as efficient as native (non-destructive) >>>>> list handling functions. (I hope some APIs do this already.) >>>> >>>> and the point about Turtle syntax, and the convenience of writing, is >>>> important. >>>> >>>> Jena has container and collection APIs to make working with containers, >>>> collections easier but the details leak out if you can take a triple view. >>> >>> Thinking about it today, I wonder if we can define a "Simple List" (or >>> "Proper List", or something) as a list that can be losslessly >>> transmitted via turtle's (...) syntax. That means its structure is >>> b-nodes, with no extra arcs, etc. (Interestingly, it also means you >>> can't include rdf:type rdf:List arcs....) Then, we encourage tools to >>> read/write simple lists, and to work with them efficiently. >> >> To be honest, I think the ref:Collection structure is just a dead duck. >> >> It works OK for lists on the order of 10 items, but is pretty impractical for thousands, or tens of thousands. > > My claim is that in the typical or best cases, Simple Lists can be > handled as efficiently as any general List structure, using your choice > of single-linked list, doubly-linked lists, arrays, or possibly even > something more complicated. That sounds good… > Basically, I'm suggesting systems only turn them into something visible > as triples when someone really wants that, and in general, no one should > want that very much. I think it will happen in toy systems and Great, but how would you tell when people want to see the structure? I don't think it's often useful, except when you're exporting data, and then I expect you'd much rather see it rendered (in Turtle at least) as (1 2 3 …). Which suggests that you shouldn't ever bother with the rdf:first / rest business. > debugging systems, where people do ?p ?o queries; but in deployed apps, > where the software is going to take some action based on particular > predicates, it's not going to be querying for rdf:first or rdf:rest > triples. Handling this for the ?p case is going to be tricky. Not impossible, but tricky. > Certainly this would be more work for triple stores, but I think it's > it's probably a better option than coming up with something completely > new. Sure, but if the mechanism is sufficiently complex to implement then a large number of systems wont implemented it, and it's not really any better than investing something new (not that I think we should do that). >> I would be mildly opposed to anything which promoted it's use further. >> >>> Simple lists have the advantage over Seq, I believe, that in the face of >>> truth-preserving RDF operations (subsetting, merging, various sound >>> inferences), they never produce wrong data. In the worst case, they no >>> longer provide data -- the simple list is mangled in some way -- but >>> it'll never just tell you the wrong thing. I think this is a big win. >> >> True, but detecting if a Collection is mangled is computationally expensive. > > I'm picturing Simple List handling being done using native list > structures, which can't get mangled. So the only detection is when > someone creates a Simple List *not* using provided list machinery, which > should be a rare case. OK. - Steve >>>> Ivan: >>>>> But it is a bit of a problem that SPARQL 1.1 still does not cover list handling fully:-( >>>> >>>> SPARQL 1.2 will not solve anything I'm afraid. SPARQL 1.1 Query has >>>> gone as far as it can, except maybe a little extra syntactic sugar with >>>> >>>> { ?list rdf:rest*/rdf:first ?member } >>>> >>>> It's much better than handling Seqs. >>> >>> I'm trying to brainstorm ways to shoe-horn list handling into SPARQL. I >>> don't know if there's any elegant way, but maybe there's a hack that's >>> not too bad. >>> >>> One approach is to update the results format to allow lists of values >>> where it currently allows single values. And then offer some way to >>> signal you want Simple Lists to be returned as list values instead of >>> b-nodes. One way to do that would be a LISTRESULT function that takes >>> a simple list's starting bnode and returns something that the results >>> format serializes as a list. Essentially, it's just a way of saying you >>> want a list result here. >>> >>> So... >>> >>> SELECT ?x ?y LISTRESULT(?z) WHERE... >>> >>> would require ?z to be bound to a simple list and would pack the list >>> elements into the result format in a manner specific to that results >>> format (XML, JSON, etc). >>> >>> Other, normal list builtins like SUBLIST(list, startpos, stoppos) could >>> be used to make sure the size of the returned list is manageable. >>> >>> Another approach, perhaps, would be some kind of dis-aggregator, a pair >>> of builtins that work together to make a list appear like many different >>> results: >>> >>> Data: >>> eg:Alice eg:likes ( eg:Bob eg:Charlie ). >>> >>> Query: >>> SELECT ?who ITERINDEX(?list) ITERVALUE(?list) >>> WHERE ?who eg:likes ?list. >>> >>> would return results: >>> eg:Alice 0 eg:Bob >>> eg:Alice 1 eg:Charlie >>> >>> although not necessarily in that order. >>> >>> That's pretty messy to spec and implement, but might be pretty nice to >>> use. >>> >>> >>>> SPARQL Update can manuipuate lists but it's ugly: >>>> http://lists.w3.org/Archives/Public/public-rdf-dawg/2011JanMar/0389.html >>>> >>>> >>>> The fundamental problem in SPARQL is that any order is lost; so this >>>> list access works for some cases, where the order does not matter. >>>> >>>> Even if a special order preserving construct were available, order is >>>> lost in the rest of the query. An order-presering QL would not be >>>> SPARQL 1.2 - it would be have completely different basis, (e.g. no >>>> chance of implementing use hash joins), would be very hard to have >>>> parallel implementation (see "big data" graph languages), and still does >>>> not work when two ordered different subresults need combining. >>>> >>>> Fundamentally, there are two problems: >>>> >>>> 1/ Encoding in triples >>>> 2/ Lists aren't the only datastructure. >>>> >>>> Reification, containers and collections encode data structure in triples >>>> but if the app can see "triples" then this leaks through to the >>>> application. It also means there can be the possibility of 'bad' data >>>> as Jeremy says. Seeing the triples is confusing at best. >>> >>> Yes, seeing the triples is a problem, but I'm hoping it's not that bad, >>> and that mostly people pull what they want out of a graph and ignore the >>> rest. >>> >>>> The structure we have may not say what you want: >>>> List(1 2 3) != List(3 2 1) >>>> but if a list is being used to express an unordered collection, a higher >>>> level convention has to be communicated. >>>> >>>> I think the only complete solution will involve putting structural >>>> literals into RDF itself, so they are not triple-encoded and can't be >>>> 'bad'. When treated as first-class literals with equality rules, >>>> accessors, and combining rules, then implementations can store them >>>> specially, provide good APIs, and application programmer won't have to >>>> learn about the encoding rules. >>> >>> That sounds pretty hard. Do you have some design in mind...? >>> >>> - Sandro >>> >>> >>> >>> >> >> > > >
Received on Sunday, 16 October 2011 20:29:16 UTC