- From: Sean B. Palmer <sean@miscoranda.com>
- Date: Fri, 13 Oct 2017 12:07:49 +0100
- To: SW-forum Web <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
- Cc: www-archive <www-archive@w3.org>
All these years I've been wondering why SPARQL won over N3QL and other proposals as the main RDF query language. Who better to ask than EricP, old friend and one of the original SPARQL architects, as well as all-around suave Semantic gentleman? As usual the raw log of our conversation is below, but I'll give a summary first. I had known that N3QL was considered inferior to SPARQL because the latter was defined by Guha as being conceptually and syntactically closer to SQL. At the time we didn't have the modern non-relational NoSQL databases like MongoDB, or even industry key-value stores like Redis, so relational and especially SQL were the sole paradigm. Even though SPARQL was to act on graphs, it was thought that easing the conceptual transition by making it look as relational as possible was a very attractive characteristic. On the other hand, this did not seem like a full explanation to me, especially given the provenance of N3QL. EricP explained that another factor was that Guha's proposal had many implementations. On the other hand although N3QL implementations would have been easier to write with the SWAP tools, it appears that nobody had *actually done that*, which of course in retrospect is a very disappointing circumstance. On the actual merits of N3QL compared to SPARQL, we established that N3QL is superior in terms of parsing and query, but that the results algebra would have required a BGP multiset implementation in the SWAP/N3Logic tools. This would not have been difficult, and it puts N3QL on parity with SPARQL for the results algebra, whereas N3QL is still ahead in terms of ease of parsing and query. Meanwhile EricP also told me about his own Algae2, which like N3QL was another alternative proposal for SPARQL that was rejected. In this case, EricP argues that Algae2 had advantages over both N3QL and SPARQL, in that it treated both query and the results algebra in a unified way, at the apparent expense of parsing. The Algae2 results algebra did, in fact, have some influence on the final form of the SPARQL results algebra, but in retrospect Algae2 may have been a better choice of format than both N3QL and SPARQL. I looked at the question a little further, and found that to some extent SPARQL was chosen not only due to the non-technological considerations above, but also through further historical accident. EricP himself says that the final decision was actually very close, and Algae2 itself could very easily have been selected. One very illuminating thing that EricP said is that we really could do with something like a strongly-typed functional language to provide the building blocks for RDF query. It was always quite strange that we never had for that SWAP, but of course SWAP had its heritage in things like Prolog and description logics, not Haskell and type theory. If there is ever to be a SPARQL 2.0 it would probably be useful to start with Algae2, N3QL, and any other rejected alternatives, and add type theory. <sbp> do you have any recollection of why N3QL wasn't considered for SPARQL? <sbp> or rather, why it was considered and rejected <ericP> guha's query language won out 'cause it had the most impl, iirc <ericP> one critisism of N3QL was that you still kinda needed a query language to get the bindings out <sbp> yeah, but we had both query languages AND implementations of them in CWM and Euler and so on! <ericP> the other challenge was that graph UNION constraints weren't relevent to the table join semantics you need to provde a consistent model for combining parts of a query <sbp> is that true? huh. I can't imagine an example where graph union would not be a consistent model <ericP> for instance, a pattern like { ?n1 :p1 ?n2 . ?n2 :p3 ?n3 } shoud be decomposable into { ?n1 :p1 ?n2 . } JOIN { ?n2 :p3 ?n3 } <sbp> yep. why do you say it's not? <sbp> because you don't think you can merge the quantifications? <sbp> if you look at the N3QL document, there are already quantifications across different formulae that are being merged <sbp> e.g. across the select and where clauses. those are separate formulae <sbp> so it should be possible to do it for JOIN too. I imagine, here, that timbl had select/where working, at least in his mind <sbp> I mean, if this didn't work across formulae at the top level then { ?a ?b ?c } => { ?c ?b ?a } wouldn't have worked either, so I guess it's trivial to demonstrate that JOIN would work too <sbp> in fact we had a predicate for doing just that, I believe log:conjunction <ericP> i think all that stuff got added when yosi was making cwm pass the SPARQL test suite <ericP> (years later) <sbp> I added cwm to github: https://github.com/sbp/cwm <sbp> so I might be able to tell exactly when it was added... <ericP> in fact, it was quite a fight just to get SPARQL to use the same terminals as N3. <ericP> i believe the only place that having a result sets in RDF graphs would have simplified the semantics was in SPARQL UNION <sbp> log:conjunction was in the oldest version of cwmBuiltins.html actually, from 2003: https://github.com/sbp/cwm/commit/dd23fa17e5d2cc4ae2e137ad3ccb979988b3856a#diff-763aed3713030d009a750edc0a818463 <sbp> and N3QL was 2004 <sbp> log:conjunction probably predates 2003, note; that's just the oldest version of the docs in the repo <sbp> it's not so much that it simplifies the semantics, it's the fact that it simplifies both the parsing (because N3QL is a N3 format, whereas SPARQL is an ad hoc format for which you have to write a new and entirely custom parser that can't be reused for anything else), and the actual query itself (because we already had CWM and Euler and others around to do that querying) <sbp> also, as you know, it would mean that the constraints and so on could be merged with the same world as the rest of N3Logic, which included rules and constraints and various other interesting logic applications. N3QL was a first class citizen of *that* world, whereas SPARQL is not <ericP> but defining everything in terms of graph opperations over a specific encoding of a result set in a graph is necessarily more complex than just defining the join over the result set <sbp> why? gimme an example of where it's more complex <sbp> anything modelled in the abstract in SPARQL can be modelled in the abstract in CWM, but as graphs <ericP> take a peek at https://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra (SPARQL 1.0) <ericP> look at e.g. the definition of LeftJoin and look at the equivalent n3 axioms to implement that in cwm <ericP> (i can't dig those up, but perhaps you can) <ericP> N3QL has a an apparent closure where it's graphs in and graphs out, but that's ignoring the real typing. <sbp> I'm looking but this is complicated by the fact that SPARQL was implemented for CWM, so results for left-join tend to be SPARQL oriented and finding pure CWM --think stuff for it is not easy. if only we had some kind of... semantic... web... to make these queries... <ericP> the input to a BGP is a DataGraph and the result ResultsGraph. if you try to pass DataGraphs to a join operation, you'll get junk <sbp> so wouldn't you just model a BGP in CWM, and do it that way? <sbp> the problem is that a multiset can have duplicate elements whereas formulae can't, right? <ericP> i think it's more just that the operations you can peform over a ResultsGraph are entirely defined by the fact that it's a results table and not at all by its encoding in an RDF graph <ericP> you do have the advantage that you can peform BGP matches over the results of BGP matches, but it's not clear that's really useful <ericP> now to pimp my own language, Algae2 had a compositional grammar which opperated over Graphs and Results <sbp> right. so you'd have to implement BGP multisets and define the operations on them, but you'd have to do that in SPARQL engines too. what you're saying is that you'd have to put in the same work for N3QL *results* algebra as you do for SPARQL *results* algebra, right? so N3QL gives you no gain there, but it's also not worse? <sbp> but on the query side, rather than the results side, N3QL does give you that gain <sbp> https://www.w3.org/wiki/Algae <ericP> i think a lot of the opaqueness of SPARQL comes not from the fact that it's defined in terms of result sets but that it doesn't provide a set of primitive operations that you can combine in logical ways like you can with things in a strongly-typed functional language <sbp> chuckle, "This shows how to parse a simple document and look for anyone sorry enough to be known by me." <sbp> when did you work on Algae2? this page is dated 2005 <sbp> and why wasn't Algae2 considered for SPARQL? <ericP> it was but it was must me and a perl impl <ericP> it was actually very close <sbp> things could have been so much better! :-) -- Sean B. Palmer, http://inamidst.com/sbp/
Received on Friday, 13 October 2017 11:08:14 UTC