Re: RDF query (RDQL) work for Redland

On Wed, 27 Aug 2003 01:35:14 +0200
Alberto Reggiori <> wrote:

> On Monday, August 25, 2003, at 12:40  PM, Dave Beckett wrote:
> >
> > I've been playing with providing support for the Squish/RDQL style
> > querying in Redland and now the W3C's lists are back, I'll report
> > what I've got so far.
> hello Dave
> nice work! :-)


> > I took the RDQL definition from the Jena RDQL[1] and used that
> > grammar plus the examples from Jena and @semantics' tutorials to
> > write lex & yacc versions in C for parsing it. The current state is
> > that it passes most of the RDQL test suite in Jena bar a few oddities
> > that need to be worked out (case sensitivity of tokens, difficulties
> > in identifying pattern literals).
> I think the latest RDQL Jena2 grammar updated by Andy is the one to  
> look at [1] (with regular expression support, optional commas and  
> xml:lang and rdf:dataType support on literals) - but as a start old  
> Jena 1.x grammar should be enough to test most of current running  
> software ...

Yes, the Jena1 version was what I started with.  I've added
the optional commas but not the other parts.

> ... - Andy has been working on an more up-to-date RDQL spec which  
> should be out soon (Andy: anything to say about that?). It should be  
> basically what you can see on the Jena2 CVS plus some other fixes (I  
> think!). Hopefully that document will be the common RDQL reference  
> which implementors can look at and extend it if necessary.

Now that Jena2 has shipped we can bug Andy again about that :)
The new RDQL area is at
but it's not clear what has changed.

> in relation to the RDF query tests work Andy and I converted some of  
> the Jena2 RDQL tests to n-triples [2] (which should move to a specific  
> sourceforge repository sooner or later)  - the queries/ dir contains  
> the native RDQL syntax examples which can be used for your parser  
> regression tests (misc examples with constraints, regular expressions,  
> xml:lang and rdf:dataType)
> > My current issues are on the TODO page:
> >
> > and include the problems I've found so far and mentioned above.
> >
> > It's this list of problems/incompatibilities that are probably
> > of most interest to the www-rdf-rules group.
> >
> >  * base QNames are now allowed
> do you mean in the <prefix:localname> form?

I mean there are ambiguous forms:
which is a legal URI and a legal qname.
are both of these qname (prefix:localname) forms allowed?

> >  * Add the default prefixes (rdf, rdfs, owl, ... ?)
> yes good one - most software already does that into the application -  
> it would be handy to have them defaulted directly by the parsing  
> software

This list, if it exists, has to be very well known and short.

 >  * Extensions: multiple LIMIT and OFFSET
> what do you mean? can you elaborate more on this?

I've seen that 3Store handles LIMIT (limiting number of results)
and returning results from a certain OFFSET.  I assume they
match some (My)SQL terminology.

> >  * Optionals?
> yes - useful all the time I think :)
> sometime ago I posted to this list some ideas about possible syntax for  
> optionals [3] - what are your ideas about it? do you feel more about  
> like optionals triple-patterns or optional/may-bind variables?

No opinion.

> I also noticed that Damian Steer has been recently investigating the  
> possibility to have optionals for his extended-SquishQL syntax [4]
> >
> >  * Are keywords case sensitive? Jena RDQL has an example with SELECT
> >    ?select WHERE ... but @semantics' RDQL tutorial has an example with
> >    USING dcq for ... not FOR
> in our implementation we always considered them as case *insensitive*  
> due that the RDQL seems not specifying that (or at least Jena seems  
> case-insensitive - somebody from Jena correct me if I am wrong) - while  
> porting our pure perl RDQL::Parser [5] to C/XS code [6][7] we actually  
> used the '-i' lex flag to generate a case insensitive lexer (which  
> actually implements some extensions to RDQL such as contexts/4th  
> components, LIKE operator and some primitive form of OR on URIs and  
> literals in triple-patterns)

It was Andy's "SELECT ?select ..." example that broke things
since I did havecase-insensitive keywords.  That tokenises as
<selectKeyword> " " "?" <selectKeyword>

which fails to match the grammar since after "?" it must be a legal variable name,
not a <selectKeyword> token.

I would propose not allowing variable / identifier names to be RDQL
keywords, similar restrictions to most general programming languages.

> >
> >  * Literal languages, datatypes - new "lit"@lang and  
> > "lit"@lang^^datatype
> see latest Jena2 CVS for that
> >
> >  * Pattern literals seem difficult to recognise without context
> we also had some difficulties while designing our lexer especially with  
> the hybrid usage of n-triples like syntax in the new RDQL lexer [8] to  
> flag xml:lang and rdf:dataType patterns (i.e. not using '<' and '>' to  
> group the URI of the datatype for example) - ...

You mean "foo"^ex:a
rather than "foo"^<>

There are three options after ^ - either @, ^ or it must be a qname.

> ...  and the regular expression  
> pattern syntax which allows to delimit regular-expressions with  
> arbitrary characters together with simple slash.

It seems to me that the lexer has to accept a wide variety of things
when it is expecting a pattern literal as the next token and cannot
recognise a pattern literal without that context.  It would be good to
reduce these problems somewhat.

> >
> >  * Qnames and URIs - in particular what is <a:b>
> >    if the prefix a isn't defined till later
> I think Sesame tried the N3 (other) way - it would be handy to have  
> them defined before to default substitute them while parsing.

Yes, or at least if it was allowed to interpret them in a way that
was equivalent to that.

> >
> >  * base URIs? Lots of <relativeURI> seen. @base?
> good one

There are more.  I would suggest that after how CSS does this, there
should be a way to set the content encoding of the document. (What is
the default? ASCII?)  As I recall, it uses @encoding as the first or
early line of the document.

> I have some more:
> what about using some alternative character to '?' to identify  
> variables? we found that the character '?' is conflicting/reserved by  
> the SQL standard and treated specially by JDBC/ODBC interfaces - the  
> '$' (dollar) sign might be a good alternative :-)

? works for me

> another related is context/provenance - which could be used as 4th  
> component of the triple-patterns or using braces like N3 - any idea?

Not at present

> what about a pure/hybrid n-triples++ (with bArcs and 4th component I  
> mean) syntax for triple-patterns? :-)

No thanks.  bArcs means it isn't RDF, and a 4th triple component
guarantees that.

> cheers
> Alberto
> [1]  
> doc/RDQL/rdql_grammar.html
> [2]  
> [3]
> [4]
> [5]  
> rdfstore/lib/RDQL/
> [6]  
> rdfstore/rdql.l
> [7]  
> rdfstore/rdql.y
> [8]  
> src/com/hp/hpl/jena/rdql/parser/rdql.jjt

Received on Tuesday, 9 September 2003 11:10:47 UTC