- From: Dan Connolly <connolly@w3.org>
- Date: Mon, 21 Nov 2005 20:35:21 -0600
- To: Richard Newman <r.newman@reading.ac.uk>
- Cc: public-rdf-dawg-comments@w3.org
On Wed, 2005-08-10 at 09:34 -0700, Richard Newman wrote: > Hello all, > As SPARQL is in last call, I've been asked to provide some > feedback based on my experiences of implementing twinql[1]. Hi. I think we have now addressed all your comments. Details below. Please let us know whether this reply addresses your comments to your satisfaction. If you're in a particularly helpful mood, you can put [closed] in the subject of your reply to indicate to some automted tools that yes, you are satisfied. > > (I'm re-sending this, because it didn't seem to arrive.) > > I'm not sure what structure this should take, so I'll just ramble. > Bear with me. > > twinql is a SPARQL query implementation for the Wilbur RDF toolkit > [2], written in ANSI Common Lisp. As documented on my blog[3], I took > a typically Lispy two-stage DSL approach to implementing SPARQL, > writing an executor for an s-expression syntax, then a parser to > convert text into s-expressions. This was a very useful technique, > allowing me to implement pre-compilation and direct execution of s- > expressions independently of the parser, and leaving open the > possibility of later integrating a better parser. > > The s-expression syntax is deliberately roughly isomorphic to the > query language: > > (sparql :select :distinct t :vars '(name mbox) > :from "http://www.holygoat.co.uk/foaf.rdf" > :where > '(graph-pattern > (triple x !foaf:knows !rich:RichardNewman) > (triple x !foaf:name name) > (triple x !foaf:mbox_sha1sum mbox))) > > (The parser does the job of producing Wilbur nodes (indicated by ! > prefix:suffix) from the PREFIX and BASE declarations, so they do not > appear.) > > This s-expression can be directly executed in the Lisp toplevel -- > if you have a Lisp system to hand, I invite you to ASDF-install > twinql, switch to the :twinql package, and paste it in. The executor > uses hash tables to represent bindings and touched lists in order to > do the resolution of graph patterns. Calculation of nested graph > patterns, unions, and optionals were almost zero effort, thanks to > the design of the executor -- I just provide different bindings or > touched lists to the execution function. Thus, on the topic of > unions, I would disagree that they are frustrating for implementers. > It took me literally minutes to get them working. (Whether the > implementation is correct is another matter!) > > The parser was a little less exciting to develop. I used CL-YACC, > a yacc-like parser generator for Common Lisp. This required: > > • Translating the EBNF grammar into s-expression BNF productions > • Writing a lexer > • Wiring actions into the productions to output the s-expressions. > > Translating the grammar was no fun at all. Firstly, the EBNF had > to be manually translated into BNF, requiring the creation of a > number of intermediate productions. Secondly, the grammar is beyond > the reach of a simple parser generator, requiring lookahead -- not a > difficulty for the Java compiler tool being used, but too much for CL- > YACC. Thus, implementing the parser needed continual switching back > to the lexer, shifting the complicated "smarts" into the generation > of tokens. The lexer is therefore quite complex, applying various > regular expressions, producing symbols, resolving URIs, and building > literals. One unavoidable issue was the optional dot after the last > query element in each graph pattern. This made the grammar ambiguous, > so twinql currently makes them compulsory. I'm looking for a better > solution. We have re-worked the grammar. It's now LL(1). http://www.w3.org/2001/sw/DataAccess/rq23/#grammar Regarding manually translating to BNF, we have a number of machine-generated parsers built from our grammars. There's a "@@Links to generated parsers" TODO item in the editor's draft now, but some of those generated forms should be linked for your convenience in an upcoming draft. > I'm also unhappy about my implementation of comment removal. URIs > and literals can contain hashes, so it's not enough to strip from any > hash to the end of the line. I first extract URIs and literals, to > avoid this, but this means that comments containing URI or literal > patterns could mess up your query. I'll have to fix this some time; > the lexer mostly does the job, but I could do better. > > Thanks to my existing implementation of CBDs and serialisation, > DESCRIBE and CONSTRUCT were straightforward. Similarly, serialising > binding hashes to the XML results format was easy. > > The features of SPARQL that I have not yet fully implemented are > (off the top of my head): > > • Filter functions > • Order, limits, etc. > > Regexes and other filter function calls have had effort expended > on them, but are not yet usable. These are obviously vital, and I > can't think of a better way to integrate them into the syntax. > Implementation-wise, I obviously need to map between function URIs > and functions, which should be straightforward. > > Sorting (especially with datatypes) is less easy, but is mostly > drudge work. Limit would require result caching -- computing the > whole result set, but serialising in parts -- and has not yet been > attempted. > > Named graphs were actually trivially easy to implement on top of > Wilbur's sourced quads. If any FROM NAMED terms exist in the query, > we retrieve every triple with the specified source from the FROM > source, and add them into a new store. (Now, this may not be in line > with the actual meaning of the source field -- which might not be > implementing named graphs -- but I wouldn't do it any other way.) I > don't think I've implemented graph name variables, because I couldn't > see an elegant way to do it. > > bnodes in the query have been implemented, but it's a while since > I made sure they worked correctly. Certainly they complicate matters, > but not unduly. Referencing bnodes in the results has already been > the topic of discussion on various lists, and is still a sticking > point. My particular opinion is that either a URL-based scheme > (reference binding labels for bnodes in the URL), or a filter > function, would be the best choice. > > The definitions of string literals I found to be too complex, so I > simplified a little. This is non-conformant, but I didn't want to get > bogged down with fiddling with regular expressions. > > So, what are my last call recommendations? I'm not sure. Certainly > parts of the grammar weren't designed to be easily parsed; it has a > number of features that are clearly user-focused, such as optional > dots. (If it were designed to be easily parsed, it would be s- > expressions from the start -- i.e., already a parse tree -- or be > stricter.) If I were evolving a query language, I expect I would > have done some things a little differently. > > I have seen objections made to UNION, on the grounds that > implementation is difficult. I'd like to nip that one in the bud; I > wrote twinql in a few weeks, not full-time, inventing algorithms as I > went, and UNION wasn't difficult. Noted. > Filter functions are complex, but necessarily so. I can't produce > a recommendation to simplify them. > > As I write, it occurs to me that some way to say which method is > being (server -> client), or should be (client -> server), used for > DESCRIBE would be desirable -- I'd like my clients to know that > they're getting CBDs, or the clients might wish to ask for a certain > kind of description. We considered that a bit; see http://lists.w3.org/Archives/Public/public-rdf-dawg/2005JulSep/0517.html and following... This didn't lead to a critical mass of support for anything other than confirming our decision to leave this service description area in the realm of experiementation, perhaps for later standardization. http://www.w3.org/2001/sw/DataAccess/issues#serviceDescription > This being last call, it is not the time to be discussing core > decisions, such as features, support for inference, etc., so I'll > leave those, and finish here. Your message of 17 Aug http://lists.w3.org/Archives/Public/public-rdf-dawg-comments/2005Aug/0065 seems to say that you're aware that it's not completely too late to suggest changes to our requirements, and you're not requesting any such changes just now. > Any interrogation from WG members or other interested parties is > welcome. > > Regards, > -Richard Newman > > [1] <http://www.holygoat.co.uk/projects/twinql/> > [2] <http://wilbur-rdf.sourceforge.net/> > [3] <http://www.holygoat.co.uk/blog/entry/2005-07-12-3> -- Dan Connolly, W3C http://www.w3.org/People/Connolly/ D3C2 887B 0F92 6005 C541 0875 0F91 96DE 6E52 C29E
Received on Tuesday, 22 November 2005 02:35:28 UTC