RE: Feedback on Editor's Draft.

Bryan,

It probably doesn't help you much, but I had problems with qnames in antlr
as well in early versions of my XQuery query engine. I too hoisted QNAME
into the parser trying to solve lexer difficulties, but if I recall
correctly, that then allowed users to enter spaces between the prefix,
colon, and localPart! I eventually gave up (for other reasons as well) and
eventually moved to javacc. I'm happier now (at least my analyst tells me I
should be).

You got me curious and I went looking for antlr/QNAME productions. I've been
away from antlr so long that the following xquery.g file from eXist just
looks like gobbledeegook to me now. If it's useful, more power to you:

qName returns [String name]
{
	name= null;
	String name2;
}
:
	( ncnameOrKeyword COLON ncnameOrKeyword )
	=> name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword
	{
		name= name + ':' + name2;
		#qName.copyLexInfo(#nc1);
	}
	|
	name=ncnameOrKeyword
	;

Howard


 > -----Original Message-----
 > From: public-rdf-dawg-request@w3.org
 > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan B.
 > Sent: Wednesday, March 16, 2005 4:08 PM
 > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson,
 > Bryan B.
 > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' '
 > Subject: RE: Feedback on Editor's Draft.
 >
 >
 >
 > Andy,
 >
 > With reference to the QNAME lexical production, the issue revolves
 > around ambiguity after the ":" in a QNAME.  There is ambiguity
 > between NCNAME1 (in the 17Feb05 working draft production) and pretty
 > much all of the other lexical tokens, e.g., "select", "union", etc.
 > This is because the ANTLR-generated parser / lexer is unable to
 > differentiate between the end of the QNAME and a QNAME that continues
 > to absorb characters.
 >
 > For example:
 >
 >  foo:select
 >
 > could be a QNAME ("foo:") and the keyword "select", or a single
 > QMAME ( "foo:select" )  We need the parser context in order to
 > differentiate between these cases.  It can't be done in the lexer
 > alone (or without the use of lexical state, which is pretty much
 > the same thing).
 >
 > I liked the old flex/lex model for managing lexical state from
 > the parser.  ANTLR handles this ... differently.  E.g., with
 > multiplexed token streams and with syntactic predicates for limited
 > lookahead.
 >
 > I have actually hoisted the QNAME production into the parser in order
 > to get the additional context required to make the parser decisions.
 > I am currently trying to figure out if I accept ":" as a legal QNAME
 > in the same fashion or if I need to change it around to use lexical
 > state (by one mechanism or another).
 >
 > If there is any non-implementation specific lesson here, it is that
 > there are lexer / parser interactions in the SPARQL grammar.  It is
 > my guess that supporting Turtle (when I migrate to the editor's draft)
 > will identify other such interactions.
 >
 > With respect to test cases, I hope to produce some more, but that has
 > not been my focus at the moment.
 >
 > Thanks,
 >
 > -bryan
 >
 > -----Original Message-----
 > From: public-rdf-dawg-request@w3.org
 > To: Thompson, Bryan B.
 > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org '
 > Sent: 3/16/2005 1:35 PM
 > Subject: Re: Feedback on Editor's Draft.
 >
 >
 > Thompson, Bryan B. wrote:
 > > Per Andy's request, I started on migration of the parser
 > implementation
 > > to the Editor's Draft of SPARQL.  I spent the morning on this and I
 > have
 > > summarized some questions below that showed up during that time.
 > However,
 > > I think that I am going to back off and continue with the last working
 > > draft as the basis for my continuing efforts since I am more
 > interested
 > > in exploring SPARQL semantics, since migrating to the new grammar is
 > > probably best done by a re-write (if I was really going to vet the
 > > grammar in the Editor's Draft), and since I don't want to have to
 > re-vet
 > > the grammar multiple times as the draft is edited.
 >
 > The changes to the grammar should now be limited to anything coming out
 > of the
 > sorting discussions.  I hope you will continue to provide review and
 > feedback -
 > early working group feedback is very helpful.
 >
 >  > Finally, from the
 > > perspective of semantics, most syntax changes (e.g., the turtle
 > syntax)
 > > are not a big deal and it feels like a lot of effort to track a moving
 > > document.
 > >
 > > That said, I would be happy to do a migration to the Editor's draft
 > > once it gets into a "feature freeze" state and before it is released
 > > to last call.  At that time I should be able to provide feedback not
 > > only on the grammar, but also on the semantics.
 > >
 > > Some questions on Editor's Draft.
 > >
 > > ? Production [3] specifies <SparqlParserBase>, which is not a defined
 > >   lexical production.
 >
 > Fixed - a side effect of running cpp over the gramamr with -DBASE=...
 > :-) which
 > makes sure UNSAID does not creep back in.
 >
 > >
 > > ? Production [56] (Q_URIRef) appears to have a whitespace character in
 > >   the [^> ] expression so that a whitespace character is not permitted
 > >   within the production.  However this is not clear on visual
 > >   inspection of the production.
 >
 > ^ is "not" character - that expressions means "not space or >".  Spaces
 > can not
 > appear in URIs.
 >
 > >
 > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since the
 > >   NCNAME_PREFIX is optional in the grammar.  Is this an error?  If
 > >   not, it makes the PrefixDecl production ambiguous.
 >
 > Simplified to just the first rule.
 >
 > >
 > > ? Production 58 (QNAME) reates an ambiguity in the grammar since QNAME
 > >   permits "<QNAME_NS> :" without any trailing context.  This ambiguity
 > >   can be resolved in several ways.  For example, by making the "(
 > >   NCNAME1 | NCNAME2 )" production non-optional for QNAME.
 >
 > I think this is an ANTLR-ism.  Tokenizing in the usual flex/javacc way
 > with
 > greedy consumption of input does not have this problem as far as I know.
 > I have
 > made a change that should remove it anyway. [*] and see below.
 >
 > Aside: as you are using ANTLR, you can either do syntactic or semantic
 > lookahead
 > but then you may wish to make more wholesale changes to the token rules
 > and
 > reduce the number of token productions anyway.
 >
 > >
 > > ? Production 58 (QNAME) would allow ":foo" as a QName.  This is NOT a
 > >   legal XML QName.  If the intention is to permit such constructions,
 > >   then the use of "QName" may prove confusing to implementors.
 >
 > ":foo" is legal as is "foo:" and ":"  Yes, they are not XML QNames. But
 > they are
 > so widely referred to as qnames in the semantic web community, it would
 > also be
 > confusing to invent a new term.
 >
 > >
 > > ? Production 51 (QName) This production causes conflicts in the
 > >   grammar.  I modified the production to "(NCNAME_PREFIX)? COLON (
 > >   NCNAME1 | NCNAME2 )", which requires something after the COLON and
 > >   which I believe supports the uses of QName in the grammar.
 >
 > [*] This is related to the above.
 >
 > I modifed QNAME (not the grammar rule QName) along the lines suggested.
 >
 > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through out.
 >
 > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because only
 > one
 > kind is legal for prefixes, but both are local names.  qnames can't
 > start with _
 > because that looks like a blank node.  Other fun and games to exclude
 > trailing
 > dots in qnames as WG decision.
 >
 > >
 > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical.  Note
 > >   that production 59 (BNODE) is not used and should presumably be
 > >   dropped.
 >
 > Removed BNODE - I had changed the name and didn't remove the definition
 > in the
 > formatting system.
 >
 > >
 > > Thanks,
 > >
 > > -bryan
 > >
 >
 > Thanks for the feedback. I'll need to go back and check but with the
 > changes I
 > described, the grammar passes by syntax tests I have.
 >
 > Bryan (and anyone else) - do you have any syntax test cases?  If so, I'd
 > be
 > happy to collect them all together, or you can add them to test DAWG
 > test suite.
 >
 > 	Andy
 >
 >

Received on Thursday, 17 March 2005 01:01:28 UTC