SPARQL, N3QL, and Algae2 from Sean B. Palmer on 2017-10-13 (public-lod@w3.org from October 2017)

From: Sean B. Palmer <sean@miscoranda.com>
Date: Fri, 13 Oct 2017 12:07:49 +0100
To: SW-forum Web <semantic-web@w3.org>, Linked Data community <public-lod@w3.org>
Cc: www-archive <www-archive@w3.org>
Message-ID: <CAH3-oEcOpMOd4XWevn0Sfq2wDzidcD=gS7ban56qKQXgzhUvXw@mail.gmail.com>

All these years I've been wondering why SPARQL won over N3QL and other
proposals as the main RDF query language. Who better to ask than
EricP, old friend and one of the original SPARQL architects, as well
as all-around suave Semantic gentleman? As usual the raw log of our
conversation is below, but I'll give a summary first.

I had known that N3QL was considered inferior to SPARQL because the
latter was defined by Guha as being conceptually and syntactically
closer to SQL. At the time we didn't have the modern non-relational
NoSQL databases like MongoDB, or even industry key-value stores like
Redis, so relational and especially SQL were the sole paradigm. Even
though SPARQL was to act on graphs, it was thought that easing the
conceptual transition by making it look as relational as possible was
a very attractive characteristic.

On the other hand, this did not seem like a full explanation to me,
especially given the provenance of N3QL. EricP explained that another
factor was that Guha's proposal had many implementations. On the other
hand although N3QL implementations would have been easier to write
with the SWAP tools, it appears that nobody had *actually done that*,
which of course in retrospect is a very disappointing circumstance.

On the actual merits of N3QL compared to SPARQL, we established that
N3QL is superior in terms of parsing and query, but that the results
algebra would have required a BGP multiset implementation in the
SWAP/N3Logic tools. This would not have been difficult, and it puts
N3QL on parity with SPARQL for the results algebra, whereas N3QL is
still ahead in terms of ease of parsing and query.

Meanwhile EricP also told me about his own Algae2, which like N3QL was
another alternative proposal for SPARQL that was rejected. In this
case, EricP argues that Algae2 had advantages over both N3QL and
SPARQL, in that it treated both query and the results algebra in a
unified way, at the apparent expense of parsing. The Algae2 results
algebra did, in fact, have some influence on the final form of the
SPARQL results algebra, but in retrospect Algae2 may have been a
better choice of format than both N3QL and SPARQL.

I looked at the question a little further, and found that to some
extent SPARQL was chosen not only due to the non-technological
considerations above, but also through further historical accident.
EricP himself says that the final decision was actually very close,
and Algae2 itself could very easily have been selected.

One very illuminating thing that EricP said is that we really could do
with something like a strongly-typed functional language to provide
the building blocks for RDF query. It was always quite strange that we
never had for that SWAP, but of course SWAP had its heritage in things
like Prolog and description logics, not Haskell and type theory. If
there is ever to be a SPARQL 2.0 it would probably be useful to start
with Algae2, N3QL, and any other rejected alternatives, and add type
theory.

<sbp> do you have any recollection of why N3QL wasn't considered for SPARQL?
<sbp> or rather, why it was considered and rejected
<ericP> guha's query language won out 'cause it had the most impl, iirc
<ericP> one critisism of N3QL was that you still kinda needed a query
language to get the bindings out
<sbp> yeah, but we had both query languages AND implementations of
them in CWM and Euler and so on!
<ericP> the other challenge was that graph UNION constraints weren't
relevent to the table join semantics you need to provde a consistent
model for combining parts of a query
<sbp> is that true? huh. I can't imagine an example where graph union
would not be a consistent model
<ericP> for instance, a pattern like { ?n1 :p1 ?n2 . ?n2 :p3 ?n3 }
shoud be decomposable into { ?n1 :p1 ?n2 . } JOIN { ?n2 :p3 ?n3 }
<sbp> yep. why do you say it's not?
<sbp> because you don't think you can merge the quantifications?
<sbp> if you look at the N3QL document, there are already
quantifications across different formulae that are being merged
<sbp> e.g. across the select and where clauses. those are separate formulae
<sbp> so it should be possible to do it for JOIN too. I imagine, here,
that timbl had select/where working, at least in his mind
<sbp> I mean, if this didn't work across formulae at the top level
then { ?a ?b ?c } => { ?c ?b ?a } wouldn't have worked either, so I
guess it's trivial to demonstrate that JOIN would work too
<sbp> in fact we had a predicate for doing just that, I believe log:conjunction
<ericP> i think all that stuff got added when yosi was making cwm pass
the SPARQL test suite
<ericP> (years later)
<sbp> I added cwm to github: https://github.com/sbp/cwm
<sbp> so I might be able to tell exactly when it was added...
<ericP> in fact, it was quite a fight just to get SPARQL to use the
same terminals as N3.
<ericP> i believe the only place that having a result sets in RDF
graphs would have simplified the semantics was in SPARQL UNION
<sbp> log:conjunction was in the oldest version of cwmBuiltins.html
actually, from 2003:
https://github.com/sbp/cwm/commit/dd23fa17e5d2cc4ae2e137ad3ccb979988b3856a#diff-763aed3713030d009a750edc0a818463
<sbp> and N3QL was 2004
<sbp> log:conjunction probably predates 2003, note; that's just the
oldest version of the docs in the repo
<sbp> it's not so much that it simplifies the semantics, it's the fact
that it simplifies both the parsing (because N3QL is a N3 format,
whereas SPARQL is an ad hoc format for which you have to write a new
and entirely custom parser that can't be reused for anything else),
and the actual query itself (because we already had CWM and Euler and
others around to do that querying)
<sbp> also, as you know, it would mean that the constraints and so on
could be merged with the same world as the rest of N3Logic, which
included rules and constraints and various other interesting logic
applications. N3QL was a first class citizen of *that* world, whereas
SPARQL is not
<ericP> but defining everything in terms of graph opperations over a
specific encoding of a result set in a graph is necessarily more
complex than just defining the join over the result set
<sbp> why? gimme an example of where it's more complex
<sbp> anything modelled in the abstract in SPARQL can be modelled in
the abstract in CWM, but as graphs
<ericP> take a peek at
https://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra (SPARQL 1.0)
<ericP> look at e.g. the definition of LeftJoin and look at the
equivalent n3 axioms to implement that in cwm
<ericP> (i can't dig those up, but perhaps you can)
<ericP> N3QL has a an apparent closure where it's graphs in and graphs
out, but that's ignoring the real typing.
<sbp> I'm looking but this is complicated by the fact that SPARQL was
implemented for CWM, so results for left-join tend to be SPARQL
oriented and finding pure CWM --think stuff for it is not easy. if
only we had some kind of... semantic... web... to make these
queries...
<ericP> the input to a BGP is a DataGraph and the result ResultsGraph.
if you try to pass DataGraphs to a join operation, you'll get junk
<sbp> so wouldn't you just model a BGP in CWM, and do it that way?
<sbp> the problem is that a multiset can have duplicate elements
whereas formulae can't, right?
<ericP> i think it's more just that the operations you can peform over
a ResultsGraph are entirely defined by the fact that it's a results
table and not at all by its encoding in an RDF graph
<ericP> you do have the advantage that you can peform BGP matches over
the results of BGP matches, but it's not clear that's really useful
<ericP> now to pimp my own language, Algae2 had a compositional
grammar which opperated over Graphs and Results
<sbp> right. so you'd have to implement BGP multisets and define the
operations on them, but you'd have to do that in SPARQL engines too.
what you're saying is that you'd have to put in the same work for N3QL
*results* algebra as you do for SPARQL *results* algebra, right? so
N3QL gives you no gain there, but it's also not worse?
<sbp> but on the query side, rather than the results side, N3QL does
give you that gain
<sbp> https://www.w3.org/wiki/Algae
<ericP> i think a lot of the opaqueness of SPARQL comes not from the
fact that it's defined in terms of result sets but that it doesn't
provide a set of primitive operations that you can combine in logical
ways like you can with things in a strongly-typed functional language
<sbp> chuckle, "This shows how to parse a simple document and look for
anyone sorry enough to be known by me."
<sbp> when did you work on Algae2? this page is dated 2005
<sbp> and why wasn't Algae2 considered for SPARQL?
<ericP> it was but it was must me and a perl impl
<ericP> it was actually very close
<sbp> things could have been so much better! :-)

--
Sean B. Palmer, http://inamidst.com/sbp/

Received on Friday, 13 October 2017 11:08:15 UTC