RE: RDF query (RDQL) work for Redland

Dave, Alberto,

Dave wrote:
> Now that Jena2 has shipped we can bug Andy again about that :)
> The new RDQL area is at http://jena.sourceforge.net/RDQL/index.html
> but it's not clear what has changed.

The grammar is at http://jena.sourceforge.net/RDQL/rdql_grammar.html  It is
automatically generated by jjdoc so it is exactly as the RDQL/Jena grammar -
I haven't altered it in any way.

Changes have been gradual but nothing incompatible has been introduced that
I am aware of.  The significant changes are proper qnames and optional
commas.  


Dave wrote:
> I mean there are ambiguous forms:
>    <mailto:dave>
> which is a legal URI and a legal qname.
>    <ex:a>
>    ex:a
> are both of these qname (prefix:localname) forms allowed?

This is a legacy issue - I still allow the <ex:a> form only because of that.
The resolution rule is to see of the prefix is defined as a prefix (even if
it is "http:"!) then substitute, else leave it alone. If it is a qname like
ex:a, then there must be a substitution else the query is wrong.  I would
encourage query application writers to use the ex:a form when they mean
that.  <ex:a> and ex:a are different terms in the grammar, not the same with
optional <>.


Dave wrote:
> > >  * Add the default prefixes (rdf, rdfs, owl, ... ?)
Alberto replied:
> > 
> > yes good one - most software already does that into the 
> application -  
> > it would be handy to have them defaulted directly by the parsing  
> > software

The current list I use is rdf, rdfs, owl and xsd. However these are merely
defaults for convenience and do not stop you writing your own declarations,
as same or different URIs, to write fully portable queries.  When Jena
prints parsed queries back out again, it ensures that it does not rely on
the default prefixes.  They are only a shorthand.


Dave wrote:
> It was Andy's "SELECT ?select ..." example that broke things
> since I did havecase-insensitive keywords.  That tokenises as
> <selectKeyword> " " "?" <selectKeyword>
> 
> which fails to match the grammar since after "?" it must be a 
> legal variable name,
> not a <selectKeyword> token.
> 
> I would propose not allowing variable / identifier names to be RDQL
> keywords, similar restrictions to most general programming languages.

I am neutral on whether variables can be keywords or not.

I would observe that I didn't find it that hard even with
context-insensitive lexing because variables are marked by ?, I just have a
parser rule:

void Identifier() :
{}
{
  ( <IDENTIFIER> 
    // And all keywords
    | <SELECT> | <SOURCE> | <FROM> | <WHERE>
    | <SUCHTHAT> | <PREFIXES> | <FOR>
    | <STR_EQ> | <STR_NE> )
  { jjtThis.set(token.image) ; }
}

i.e. and identifier (no keywords) or a keyword.  The result of the rule is
the exact text so matched, including the text of the keyword.  Presumably
the same approach works for yacc/lex systems, getting the matched text for
the token found.

Alberto wrote:
> > ...  and the regular expression  
> > pattern syntax which allows to delimit regular-expressions with  
> > arbitrary characters together with simple slash.

Dave replied:
> It seems to me that the lexer has to accept a wide variety of things
> when it is expecting a pattern literal as the next token and cannot
> recognise a pattern literal without that context.  It would be good to
> reduce these problems somewhat.

The syntax here is Perl regular expressions with one addition.  Because / is
a common character in URIs, and a regex can be used to test a URI, I allowed
regexs to be delimited with things like !! directly, not just the m!! form.
The semantics is exactly as Perl (as implemented by ORO in my case).

For regular expressions, there is a wide range of characters - you have to
not have keyword tokenization, no way round it.  I don't think the issues
here are any different from parsing quoted strings.

Regular expressions are very useful for applications which deal with textual
descriptions of things that it seems well worth making the effort in the
parser to have a convenient syntax.

	Andy

> -----Original Message-----
> From: Dave Beckett [mailto:dave.beckett@bristol.ac.uk] 
> Sent: 9 September 2003 15:41
> To: Alberto Reggiori
> Cc: 'www-rdf-rules@w3.org'; Andy Seaborne
> Subject: Re: RDF query (RDQL) work for Redland
> 
> 
> On Wed, 27 Aug 2003 01:35:14 +0200
> Alberto Reggiori <alberto@asemantics.com> wrote:
> 
> > On Monday, August 25, 2003, at 12:40  PM, Dave Beckett wrote:
> > 
> > >
> > > I've been playing with providing support for the Squish/RDQL style
> > > querying in Redland and now the W3C's lists are back, I'll report
> > > what I've got so far.
> > 
> > hello Dave
> > 
> > nice work! :-)
> 
> Thanks
> 
> > > I took the RDQL definition from the Jena RDQL[1] and used that
> > > grammar plus the examples from Jena and @semantics' tutorials to
> > > write lex & yacc versions in C for parsing it. The 
> current state is
> > > that it passes most of the RDQL test suite in Jena bar a 
> few oddities
> > > that need to be worked out (case sensitivity of tokens, 
> difficulties
> > > in identifying pattern literals).
> > 
> > I think the latest RDQL Jena2 grammar updated by Andy is 
> the one to  
> > look at [1] (with regular expression support, optional commas and  
> > xml:lang and rdf:dataType support on literals) - but as a 
> start old  
> > Jena 1.x grammar should be enough to test most of current running  
> > software ...
> 
> Yes, the Jena1 version was what I started with.  I've added
> the optional commas but not the other parts.
> 
> > ... - Andy has been working on an more up-to-date RDQL spec which  
> > should be out soon (Andy: anything to say about that?). It 
> should be  
> > basically what you can see on the Jena2 CVS plus some other 
> fixes (I  
> > think!). Hopefully that document will be the common RDQL reference  
> > which implementors can look at and extend it if necessary.
> 
> Now that Jena2 has shipped we can bug Andy again about that :)
> The new RDQL area is at http://jena.sourceforge.net/RDQL/index.html
> but it's not clear what has changed.
> 
> > in relation to the RDF query tests work Andy and I 
> converted some of  
> > the Jena2 RDQL tests to n-triples [2] (which should move to 
> a specific  
> > sourceforge repository sooner or later)  - the queries/ dir 
> contains  
> > the native RDQL syntax examples which can be used for your parser  
> > regression tests (misc examples with constraints, regular 
> expressions,  
> > xml:lang and rdf:dataType)
> > 
> > > My current issues are on the TODO page:
> > >   http://www.redland.opensource.ac.uk/rasqal/TODO.html
> > > and include the problems I've found so far and mentioned above.
> > >
> > > It's this list of problems/incompatibilities that are probably
> > > of most interest to the www-rdf-rules group.
> > >
> > >  * base QNames are now allowed
> > 
> > do you mean in the <prefix:localname> form?
> 
> I mean there are ambiguous forms:
>    <mailto:dave>
> which is a legal URI and a legal qname.
>    <ex:a>
>    ex:a
> are both of these qname (prefix:localname) forms allowed?
> 
> > >  * Add the default prefixes (rdf, rdfs, owl, ... ?)
> > 
> > yes good one - most software already does that into the 
> application -  
> > it would be handy to have them defaulted directly by the parsing  
> > software
> 
> This list, if it exists, has to be very well known and short.
> 
>  >  * Extensions: multiple LIMIT and OFFSET
> > 
> > what do you mean? can you elaborate more on this?
> 
> I've seen that 3Store handles LIMIT (limiting number of results)
> and returning results from a certain OFFSET.  I assume they
> match some (My)SQL terminology.
> 
> > >  * Optionals?
> > 
> > yes - useful all the time I think :)
> > 
> > sometime ago I posted to this list some ideas about 
> possible syntax for  
> > optionals [3] - what are your ideas about it? do you feel 
> more about  
> > like optionals triple-patterns or optional/may-bind variables?
> 
> No opinion.
> 
> > I also noticed that Damian Steer has been recently 
> investigating the  
> > possibility to have optionals for his extended-SquishQL syntax [4]
> > 
> > >
> > >  * Are keywords case sensitive? Jena RDQL has an example 
> with SELECT
> > >    ?select WHERE ... but @semantics' RDQL tutorial has an 
> example with
> > >    USING dcq for ... not FOR
> > 
> > in our implementation we always considered them as case 
> *insensitive*  
> > due that the RDQL seems not specifying that (or at least 
> Jena seems  
> > case-insensitive - somebody from Jena correct me if I am 
> wrong) - while  
> > porting our pure perl RDQL::Parser [5] to C/XS code [6][7] 
> we actually  
> > used the '-i' lex flag to generate a case insensitive lexer (which  
> > actually implements some extensions to RDQL such as contexts/4th  
> > components, LIKE operator and some primitive form of OR on 
> URIs and  
> > literals in triple-patterns)
> 
> It was Andy's "SELECT ?select ..." example that broke things
> since I did havecase-insensitive keywords.  That tokenises as
> <selectKeyword> " " "?" <selectKeyword>
> 
> which fails to match the grammar since after "?" it must be a 
> legal variable name,
> not a <selectKeyword> token.
> 
> I would propose not allowing variable / identifier names to be RDQL
> keywords, similar restrictions to most general programming languages.
> 
> > >
> > >  * Literal languages, datatypes - new "lit"@lang and  
> > > "lit"@lang^^datatype
> > 
> > see latest Jena2 CVS for that
> > 
> > >
> > >  * Pattern literals seem difficult to recognise without context
> > 
> > we also had some difficulties while designing our lexer 
> especially with  
> > the hybrid usage of n-triples like syntax in the new RDQL 
> lexer [8] to  
> > flag xml:lang and rdf:dataType patterns (i.e. not using '<' 
> and '>' to  
> > group the URI of the datatype for example) - ...
> 
> You mean "foo"^ex:a
> rather than "foo"^<http://example.org/a>
> 
> There are three options after ^ - either @, ^ or it must be a qname.
> 
> 
> > ...  and the regular expression  
> > pattern syntax which allows to delimit regular-expressions with  
> > arbitrary characters together with simple slash.
> 
> It seems to me that the lexer has to accept a wide variety of things
> when it is expecting a pattern literal as the next token and cannot
> recognise a pattern literal without that context.  It would be good to
> reduce these problems somewhat.
> 
> > >
> > >  * Qnames and URIs - in particular what is <a:b>
> > >    if the prefix a isn't defined till later
> > 
> > I think Sesame tried the N3 (other) way - it would be handy 
> to have  
> > them defined before to default substitute them while parsing.
> 
> Yes, or at least if it was allowed to interpret them in a way that
> was equivalent to that.
> 
> > >
> > >  * base URIs? Lots of <relativeURI> seen. @base?
> > 
> > good one
> 
> There are more.  I would suggest that after how CSS does this, there
> should be a way to set the content encoding of the document. (What is
> the default? ASCII?)  As I recall, it uses @encoding as the first or
> early line of the document.
> 
> > I have some more:
> > 
> > what about using some alternative character to '?' to identify  
> > variables? we found that the character '?' is 
> conflicting/reserved by  
> > the SQL standard and treated specially by JDBC/ODBC 
> interfaces - the  
> > '$' (dollar) sign might be a good alternative :-)
> 
> ? works for me
> 
> > another related is context/provenance - which could be used as 4th  
> > component of the triple-patterns or using braces like N3 - any idea?
> 
> Not at present
> 
> > what about a pure/hybrid n-triples++ (with bArcs and 4th 
> component I  
> > mean) syntax for triple-patterns? :-)
> 
> No thanks.  bArcs means it isn't RDF, and a 4th triple component
> guarantees that.
> 
> > cheers
> > 
> > Alberto
> > 
> > [1]  
> > 
> http://cvs.sourceforge.net/cgi-> bin/viewcvs.cgi/*checkout*/jena/jena2/ 
> > doc/RDQL/rdql_grammar.html
> > [2]  
> > 
> http://swordfish.rdfweb.org/rdfquery/tests/tests/rdql-tests-20
03-04-10/
> [3] http://lists.w3.org/Archives/Public/www-rdf-rules/2003Apr/0030.html
> [4] http://rdfweb.org/people/damian/esquish/
> [5]  
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/rdfstore/ 
> rdfstore/lib/RDQL/Parser.pm
> [6]  
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/rdfstore/ 
> rdfstore/rdql.l
> [7]  
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/*checkout*/rdfstore/ 
> rdfstore/rdql.y
> [8]  
> http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/~checkout~/jena/jena2/ 
> src/com/hp/hpl/jena/rdql/parser/rdql.jjt
> 
> 
> 

Received on Wednesday, 10 September 2003 06:02:15 UTC