- From: Thompson, Bryan B. <BRYAN.B.THOMPSON@saic.com>
- Date: Thu, 17 Mar 2005 05:22:01 -0500
- To: "'Seaborne, Andy '" <andy.seaborne@hp.com>, "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
- Cc: ''''Eric Prud'hommeaux ' ' ' ' <eric@w3.org>, "''public-rdf-dawg@w3.org ' '" <public-rdf-dawg@w3.org>
???
> Keywords are case insensitive (except "a").
-bryan
-----Original Message-----
From: Seaborne, Andy
To: Thompson, Bryan B.
Cc: '''Eric Prud'hommeaux ' ' '; 'public-rdf-dawg@w3.org '
Sent: 3/17/2005 5:20 AM
Subject: Re: Feedback on Editor's Draft.
Thompson, Bryan B. wrote:
> Howard,
>
> Your comments are quite to the point. The problem is very much
> related to whitespace handling, which leads nicely into two
> underspecified aspects of the grammar:
>
> 1. whitespace handling is not fully disclosed. I believe that there
is a
> tacit assumption that whitespace is absorbed between tokens in the
> "parser" section and is significant within tokens in the "lexer
> section.
It is explained before the grammar.
>
> Other W3C specifications that specify grammars, e.g., the XML
grammar,
> do not have as much appearance of being a stripped down grammar
from
> some specific tool. If you look at the XML grammar, you will see
that
> it makes explicit statements concerning whitespace in all
productions.
And XQuery takes a different approach again.
http://www.w3.org/TR/xquery/#whitespace-rules
and uses comments in the EBNF to say where whitespace is not ignored.
All the tools I know (ANTLR included : Token.SKIP) have ways to act in
this mode.
> One way to say this is that it is entirely expressed at the lexer
> level. Another way to look at it is that it is less linked to the
> assumptions of a specific parser generator technology.
It is not linked to a parser generator technology as Eric's work has
shown.
Otherwise I would just put in the javacc grammar I use for testing.
>
> 2. case sensitivity is not fully disclosed. I have assumed that
> keywords are case-insensitive based on various examples in the
> specification, but the lexical rules do not show this and the
> introduction to the grammar does not spell it out. Is there
> anything else that is case insensitive? E.g., are prefix names
> case sensitive?
Keywords are case insensitive (except "a").
I'll see that the text for this is visible.
Andy
>
> Thanks,
>
> -bryan
>
> -----Original Message-----
> From: Howard Katz
> To: Thompson, Bryan B.; 'Seaborne, Andy ';
public-rdf-dawg-request@w3.org
> Cc: ''Eric Prud'hommeaux ' '; public-rdf-dawg@w3.org
> Sent: 3/16/2005 8:01 PM
> Subject: RE: Feedback on Editor's Draft.
>
> Bryan,
>
> It probably doesn't help you much, but I had problems with qnames in
> antlr
> as well in early versions of my XQuery query engine. I too hoisted
QNAME
> into the parser trying to solve lexer difficulties, but if I recall
> correctly, that then allowed users to enter spaces between the prefix,
> colon, and localPart! I eventually gave up (for other reasons as well)
> and
> eventually moved to javacc. I'm happier now (at least my analyst tells
> me I
> should be).
>
> You got me curious and I went looking for antlr/QNAME productions.
I've
> been
> away from antlr so long that the following xquery.g file from eXist
just
> looks like gobbledeegook to me now. If it's useful, more power to you:
>
> qName returns [String name]
> {
> name= null;
> String name2;
> }
> :
> ( ncnameOrKeyword COLON ncnameOrKeyword )
> => name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword
> {
> name= name + ':' + name2;
> #qName.copyLexInfo(#nc1);
> }
> |
> name=ncnameOrKeyword
> ;
>
> Howard
>
>
> > -----Original Message-----
> > From: public-rdf-dawg-request@w3.org
> > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan
> B.
> > Sent: Wednesday, March 16, 2005 4:08 PM
> > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson,
> > Bryan B.
> > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' '
> > Subject: RE: Feedback on Editor's Draft.
> >
> >
> >
> > Andy,
> >
> > With reference to the QNAME lexical production, the issue revolves
> > around ambiguity after the ":" in a QNAME. There is ambiguity
> > between NCNAME1 (in the 17Feb05 working draft production) and
pretty
> > much all of the other lexical tokens, e.g., "select", "union", etc.
> > This is because the ANTLR-generated parser / lexer is unable to
> > differentiate between the end of the QNAME and a QNAME that
continues
> > to absorb characters.
> >
> > For example:
> >
> > foo:select
> >
> > could be a QNAME ("foo:") and the keyword "select", or a single
> > QMAME ( "foo:select" ) We need the parser context in order to
> > differentiate between these cases. It can't be done in the lexer
> > alone (or without the use of lexical state, which is pretty much
> > the same thing).
> >
> > I liked the old flex/lex model for managing lexical state from
> > the parser. ANTLR handles this ... differently. E.g., with
> > multiplexed token streams and with syntactic predicates for limited
> > lookahead.
> >
> > I have actually hoisted the QNAME production into the parser in
order
> > to get the additional context required to make the parser
decisions.
> > I am currently trying to figure out if I accept ":" as a legal
QNAME
> > in the same fashion or if I need to change it around to use lexical
> > state (by one mechanism or another).
> >
> > If there is any non-implementation specific lesson here, it is that
> > there are lexer / parser interactions in the SPARQL grammar. It is
> > my guess that supporting Turtle (when I migrate to the editor's
> draft)
> > will identify other such interactions.
> >
> > With respect to test cases, I hope to produce some more, but that
has
> > not been my focus at the moment.
> >
> > Thanks,
> >
> > -bryan
> >
> > -----Original Message-----
> > From: public-rdf-dawg-request@w3.org
> > To: Thompson, Bryan B.
> > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org '
> > Sent: 3/16/2005 1:35 PM
> > Subject: Re: Feedback on Editor's Draft.
> >
> >
> > Thompson, Bryan B. wrote:
> > > Per Andy's request, I started on migration of the parser
> > implementation
> > > to the Editor's Draft of SPARQL. I spent the morning on this and
I
> > have
> > > summarized some questions below that showed up during that time.
> > However,
> > > I think that I am going to back off and continue with the last
> working
> > > draft as the basis for my continuing efforts since I am more
> > interested
> > > in exploring SPARQL semantics, since migrating to the new grammar
> is
> > > probably best done by a re-write (if I was really going to vet
the
> > > grammar in the Editor's Draft), and since I don't want to have to
> > re-vet
> > > the grammar multiple times as the draft is edited.
> >
> > The changes to the grammar should now be limited to anything coming
> out
> > of the
> > sorting discussions. I hope you will continue to provide review
and
> > feedback -
> > early working group feedback is very helpful.
> >
> > > Finally, from the
> > > perspective of semantics, most syntax changes (e.g., the turtle
> > syntax)
> > > are not a big deal and it feels like a lot of effort to track a
> moving
> > > document.
> > >
> > > That said, I would be happy to do a migration to the Editor's
draft
> > > once it gets into a "feature freeze" state and before it is
> released
> > > to last call. At that time I should be able to provide feedback
> not
> > > only on the grammar, but also on the semantics.
> > >
> > > Some questions on Editor's Draft.
> > >
> > > ? Production [3] specifies <SparqlParserBase>, which is not a
> defined
> > > lexical production.
> >
> > Fixed - a side effect of running cpp over the gramamr with
-DBASE=...
> > :-) which
> > makes sure UNSAID does not creep back in.
> >
> > >
> > > ? Production [56] (Q_URIRef) appears to have a whitespace
character
> in
> > > the [^> ] expression so that a whitespace character is not
> permitted
> > > within the production. However this is not clear on visual
> > > inspection of the production.
> >
> > ^ is "not" character - that expressions means "not space or >".
> Spaces
> > can not
> > appear in URIs.
> >
> > >
> > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since
> the
> > > NCNAME_PREFIX is optional in the grammar. Is this an error?
If
> > > not, it makes the PrefixDecl production ambiguous.
> >
> > Simplified to just the first rule.
> >
> > >
> > > ? Production 58 (QNAME) reates an ambiguity in the grammar since
> QNAME
> > > permits "<QNAME_NS> :" without any trailing context. This
> ambiguity
> > > can be resolved in several ways. For example, by making the "(
> > > NCNAME1 | NCNAME2 )" production non-optional for QNAME.
> >
> > I think this is an ANTLR-ism. Tokenizing in the usual flex/javacc
> way
> > with
> > greedy consumption of input does not have this problem as far as I
> know.
> > I have
> > made a change that should remove it anyway. [*] and see below.
> >
> > Aside: as you are using ANTLR, you can either do syntactic or
> semantic
> > lookahead
> > but then you may wish to make more wholesale changes to the token
> rules
> > and
> > reduce the number of token productions anyway.
> >
> > >
> > > ? Production 58 (QNAME) would allow ":foo" as a QName. This is
NOT
> a
> > > legal XML QName. If the intention is to permit such
> constructions,
> > > then the use of "QName" may prove confusing to implementors.
> >
> > ":foo" is legal as is "foo:" and ":" Yes, they are not XML QNames.
> But
> > they are
> > so widely referred to as qnames in the semantic web community, it
> would
> > also be
> > confusing to invent a new term.
> >
> > >
> > > ? Production 51 (QName) This production causes conflicts in the
> > > grammar. I modified the production to "(NCNAME_PREFIX)? COLON
(
> > > NCNAME1 | NCNAME2 )", which requires something after the COLON
> and
> > > which I believe supports the uses of QName in the grammar.
> >
> > [*] This is related to the above.
> >
> > I modifed QNAME (not the grammar rule QName) along the lines
> suggested.
> >
> > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through
> out.
> >
> > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because
> only
> > one
> > kind is legal for prefixes, but both are local names. qnames can't
> > start with _
> > because that looks like a blank node. Other fun and games to
exclude
> > trailing
> > dots in qnames as WG decision.
> >
> > >
> > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical.
Note
> > > that production 59 (BNODE) is not used and should presumably be
> > > dropped.
> >
> > Removed BNODE - I had changed the name and didn't remove the
> definition
> > in the
> > formatting system.
> >
> > >
> > > Thanks,
> > >
> > > -bryan
> > >
> >
> > Thanks for the feedback. I'll need to go back and check but with
the
> > changes I
> > described, the grammar passes by syntax tests I have.
> >
> > Bryan (and anyone else) - do you have any syntax test cases? If
so,
> I'd
> > be
> > happy to collect them all together, or you can add them to test
DAWG
> > test suite.
> >
> > Andy
> >
> >
>
Received on Thursday, 17 March 2005 10:22:25 UTC