W3C home > Mailing lists > Public > public-rdf-dawg-comments@w3.org > November 2005

Re: twinql Retrospective

From: Dan Connolly <connolly@w3.org>
Date: Mon, 21 Nov 2005 20:35:21 -0600
To: Richard Newman <r.newman@reading.ac.uk>
Cc: public-rdf-dawg-comments@w3.org
Message-Id: <1132626921.26034.136.camel@dirk>

On Wed, 2005-08-10 at 09:34 -0700, Richard Newman wrote:
> Hello all,
>    As SPARQL is in last call, I've been asked to provide some  
> feedback based on my experiences of implementing twinql[1].

Hi. I think we have now addressed all your comments.
Details below.

Please let us know whether this reply addresses your comments
to your satisfaction.

If you're in a particularly helpful mood, you can put [closed]
in the subject of your reply to indicate to some automted
tools that yes, you are satisfied.

>    (I'm re-sending this, because it didn't seem to arrive.)
>    I'm not sure what structure this should take, so I'll just ramble.  
> Bear with me.
>    twinql is a SPARQL query implementation for the Wilbur RDF toolkit 
> [2], written in ANSI Common Lisp. As documented on my blog[3], I took  
> a typically Lispy two-stage DSL approach to implementing SPARQL,  
> writing an executor for an s-expression syntax, then a parser to  
> convert text into s-expressions. This was a very useful technique,  
> allowing me to implement pre-compilation and direct execution of s- 
> expressions independently of the parser, and leaving open the  
> possibility of later integrating a better parser.
>    The s-expression syntax is deliberately roughly isomorphic to the  
> query language:
> (sparql :select :distinct t :vars '(name mbox)
>                  :from "http://www.holygoat.co.uk/foaf.rdf"
>                  :where
>                  '(graph-pattern
>                     (triple x !foaf:knows !rich:RichardNewman)
>                     (triple x !foaf:name name)
>                     (triple x !foaf:mbox_sha1sum mbox)))
> (The parser does the job of producing Wilbur nodes (indicated by ! 
> prefix:suffix) from the PREFIX and BASE declarations, so they do not  
> appear.)
>    This s-expression can be directly executed in the Lisp toplevel --  
> if you have a Lisp system to hand, I invite you to ASDF-install  
> twinql, switch to the :twinql package, and paste it in. The executor  
> uses hash tables to represent bindings and touched lists in order to  
> do the resolution of graph patterns. Calculation of nested graph  
> patterns, unions, and optionals were almost zero effort, thanks to  
> the design of the executor -- I just provide different bindings or  
> touched lists to the execution function. Thus, on the topic of  
> unions, I would disagree that they are frustrating for implementers.  
> It took me literally minutes to get them working. (Whether the  
> implementation is correct is another matter!)
>    The parser was a little less exciting to develop. I used CL-YACC,  
> a yacc-like parser generator for Common Lisp. This required:
>  Translating the EBNF grammar into s-expression BNF productions
>  Writing a lexer
>  Wiring actions into the productions to output the s-expressions.
>    Translating the grammar was no fun at all. Firstly, the EBNF had  
> to be manually translated into BNF, requiring the creation of a  
> number of intermediate productions. Secondly, the grammar is beyond  
> the reach of a simple parser generator, requiring lookahead -- not a  
> difficulty for the Java compiler tool being used, but too much for CL- 
> YACC. Thus, implementing the parser needed continual switching back  
> to the lexer, shifting the complicated "smarts" into the generation  
> of tokens. The lexer is therefore quite complex, applying various  
> regular expressions, producing symbols, resolving URIs, and building  
> literals. One unavoidable issue was the optional dot after the last  
> query element in each graph pattern. This made the grammar ambiguous,  
> so twinql currently makes them compulsory. I'm looking for a better  
> solution.

We have re-worked the grammar. It's now LL(1).


Regarding manually translating to BNF, we have a number of
machine-generated parsers built from our grammars. There's
a "@@Links to generated parsers" TODO item in the editor's
draft now, but some of those generated forms should be linked
for your convenience in an upcoming draft.

>    I'm also unhappy about my implementation of comment removal. URIs  
> and literals can contain hashes, so it's not enough to strip from any  
> hash to the end of the line. I first extract URIs and literals, to  
> avoid this, but this means that comments containing URI or literal  
> patterns could mess up your query. I'll have to fix this some time;  
> the lexer mostly does the job, but I could do better.
>    Thanks to my existing implementation of CBDs and serialisation,  
> DESCRIBE and CONSTRUCT were straightforward. Similarly, serialising  
> binding hashes to the XML results format was easy.
>    The features of SPARQL that I have not yet fully implemented are  
> (off the top of my head):
>  Filter functions
>  Order, limits, etc.
>    Regexes and other filter function calls have had effort expended  
> on them, but are not yet usable. These are obviously vital, and I  
> can't think of a better way to integrate them into the syntax.  
> Implementation-wise, I obviously need to map between function URIs  
> and functions, which should be straightforward.
>    Sorting (especially with datatypes) is less easy, but is mostly  
> drudge work. Limit would require result caching -- computing the  
> whole result set, but serialising in parts -- and has not yet been  
> attempted.
>    Named graphs were actually trivially easy to implement on top of  
> Wilbur's sourced quads. If any FROM NAMED terms exist in the query,  
> we retrieve every triple with the specified source from the FROM  
> source, and add them into a new store. (Now, this may not be in line  
> with the actual meaning of the source field -- which might not be  
> implementing named graphs -- but I wouldn't do it any other way.)  I  
> don't think I've implemented graph name variables, because I couldn't  
> see an elegant way to do it.
>    bnodes in the query have been implemented, but it's a while since  
> I made sure they worked correctly. Certainly they complicate matters,  
> but not unduly. Referencing bnodes in the results has already been  
> the topic of discussion on various lists, and is still a sticking  
> point. My particular opinion is that either a URL-based scheme  
> (reference binding labels for bnodes in the URL), or a filter  
> function, would be the best choice.
>    The definitions of string literals I found to be too complex, so I  
> simplified a little. This is non-conformant, but I didn't want to get  
> bogged down with fiddling with regular expressions.
>    So, what are my last call recommendations? I'm not sure. Certainly  
> parts of the grammar weren't designed to be easily parsed; it has a  
> number of features that are clearly user-focused, such as optional  
> dots. (If it were designed to be easily parsed, it would be s- 
> expressions from the start -- i.e., already a parse tree -- or be  
> stricter.)  If I were evolving a query language, I expect I would  
> have done some things a little differently.
>    I have seen objections made to UNION, on the grounds that  
> implementation is difficult. I'd like to nip that one in the bud; I  
> wrote twinql in a few weeks, not full-time, inventing algorithms as I  
> went, and UNION wasn't difficult.


>    Filter functions are complex, but necessarily so. I can't produce  
> a recommendation to simplify them.
>    As I write, it occurs to me that some way to say which method is  
> being (server -> client), or should be (client -> server), used for  
> DESCRIBE would be desirable -- I'd like my clients to know that  
> they're getting CBDs, or the clients might wish to ask for a certain  
> kind of description.

We considered that a bit; see
and following...

This didn't lead to a critical mass of support for anything
other than confirming our decision to leave this
service description area in the realm of experiementation, perhaps
for later standardization.

>    This being last call, it is not the time to be discussing core  
> decisions, such as features, support for inference, etc., so I'll  
> leave those, and finish here.

Your message of 17 Aug
seems to say that you're aware that it's not completely too late
to suggest changes to our requirements, and you're not requesting
any such changes just now.

>    Any interrogation from WG members or other interested parties is  
> welcome.
>    Regards,
> -Richard Newman
> [1] <http://www.holygoat.co.uk/projects/twinql/>
> [2] <http://wilbur-rdf.sourceforge.net/>
> [3] <http://www.holygoat.co.uk/blog/entry/2005-07-12-3>
Dan Connolly, W3C http://www.w3.org/People/Connolly/
D3C2 887B 0F92 6005 C541  0875 0F91 96DE 6E52 C29E
Received on Tuesday, 22 November 2005 02:35:28 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 17:01:22 UTC