- From: Howard Katz <howardk@fatdog.com>
- Date: Wed, 16 Mar 2005 17:01:08 -0800
- To: "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>, "'Seaborne, Andy '" <andy.seaborne@hp.com>, <public-rdf-dawg-request@w3.org>
- Cc: "''Eric Prud'hommeaux ' '" <eric@w3.org>, <public-rdf-dawg@w3.org>
Bryan,
It probably doesn't help you much, but I had problems with qnames in antlr
as well in early versions of my XQuery query engine. I too hoisted QNAME
into the parser trying to solve lexer difficulties, but if I recall
correctly, that then allowed users to enter spaces between the prefix,
colon, and localPart! I eventually gave up (for other reasons as well) and
eventually moved to javacc. I'm happier now (at least my analyst tells me I
should be).
You got me curious and I went looking for antlr/QNAME productions. I've been
away from antlr so long that the following xquery.g file from eXist just
looks like gobbledeegook to me now. If it's useful, more power to you:
qName returns [String name]
{
name= null;
String name2;
}
:
( ncnameOrKeyword COLON ncnameOrKeyword )
=> name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword
{
name= name + ':' + name2;
#qName.copyLexInfo(#nc1);
}
|
name=ncnameOrKeyword
;
Howard
> -----Original Message-----
> From: public-rdf-dawg-request@w3.org
> [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan B.
> Sent: Wednesday, March 16, 2005 4:08 PM
> To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson,
> Bryan B.
> Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' '
> Subject: RE: Feedback on Editor's Draft.
>
>
>
> Andy,
>
> With reference to the QNAME lexical production, the issue revolves
> around ambiguity after the ":" in a QNAME. There is ambiguity
> between NCNAME1 (in the 17Feb05 working draft production) and pretty
> much all of the other lexical tokens, e.g., "select", "union", etc.
> This is because the ANTLR-generated parser / lexer is unable to
> differentiate between the end of the QNAME and a QNAME that continues
> to absorb characters.
>
> For example:
>
> foo:select
>
> could be a QNAME ("foo:") and the keyword "select", or a single
> QMAME ( "foo:select" ) We need the parser context in order to
> differentiate between these cases. It can't be done in the lexer
> alone (or without the use of lexical state, which is pretty much
> the same thing).
>
> I liked the old flex/lex model for managing lexical state from
> the parser. ANTLR handles this ... differently. E.g., with
> multiplexed token streams and with syntactic predicates for limited
> lookahead.
>
> I have actually hoisted the QNAME production into the parser in order
> to get the additional context required to make the parser decisions.
> I am currently trying to figure out if I accept ":" as a legal QNAME
> in the same fashion or if I need to change it around to use lexical
> state (by one mechanism or another).
>
> If there is any non-implementation specific lesson here, it is that
> there are lexer / parser interactions in the SPARQL grammar. It is
> my guess that supporting Turtle (when I migrate to the editor's draft)
> will identify other such interactions.
>
> With respect to test cases, I hope to produce some more, but that has
> not been my focus at the moment.
>
> Thanks,
>
> -bryan
>
> -----Original Message-----
> From: public-rdf-dawg-request@w3.org
> To: Thompson, Bryan B.
> Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org '
> Sent: 3/16/2005 1:35 PM
> Subject: Re: Feedback on Editor's Draft.
>
>
> Thompson, Bryan B. wrote:
> > Per Andy's request, I started on migration of the parser
> implementation
> > to the Editor's Draft of SPARQL. I spent the morning on this and I
> have
> > summarized some questions below that showed up during that time.
> However,
> > I think that I am going to back off and continue with the last working
> > draft as the basis for my continuing efforts since I am more
> interested
> > in exploring SPARQL semantics, since migrating to the new grammar is
> > probably best done by a re-write (if I was really going to vet the
> > grammar in the Editor's Draft), and since I don't want to have to
> re-vet
> > the grammar multiple times as the draft is edited.
>
> The changes to the grammar should now be limited to anything coming out
> of the
> sorting discussions. I hope you will continue to provide review and
> feedback -
> early working group feedback is very helpful.
>
> > Finally, from the
> > perspective of semantics, most syntax changes (e.g., the turtle
> syntax)
> > are not a big deal and it feels like a lot of effort to track a moving
> > document.
> >
> > That said, I would be happy to do a migration to the Editor's draft
> > once it gets into a "feature freeze" state and before it is released
> > to last call. At that time I should be able to provide feedback not
> > only on the grammar, but also on the semantics.
> >
> > Some questions on Editor's Draft.
> >
> > ? Production [3] specifies <SparqlParserBase>, which is not a defined
> > lexical production.
>
> Fixed - a side effect of running cpp over the gramamr with -DBASE=...
> :-) which
> makes sure UNSAID does not creep back in.
>
> >
> > ? Production [56] (Q_URIRef) appears to have a whitespace character in
> > the [^> ] expression so that a whitespace character is not permitted
> > within the production. However this is not clear on visual
> > inspection of the production.
>
> ^ is "not" character - that expressions means "not space or >". Spaces
> can not
> appear in URIs.
>
> >
> > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since the
> > NCNAME_PREFIX is optional in the grammar. Is this an error? If
> > not, it makes the PrefixDecl production ambiguous.
>
> Simplified to just the first rule.
>
> >
> > ? Production 58 (QNAME) reates an ambiguity in the grammar since QNAME
> > permits "<QNAME_NS> :" without any trailing context. This ambiguity
> > can be resolved in several ways. For example, by making the "(
> > NCNAME1 | NCNAME2 )" production non-optional for QNAME.
>
> I think this is an ANTLR-ism. Tokenizing in the usual flex/javacc way
> with
> greedy consumption of input does not have this problem as far as I know.
> I have
> made a change that should remove it anyway. [*] and see below.
>
> Aside: as you are using ANTLR, you can either do syntactic or semantic
> lookahead
> but then you may wish to make more wholesale changes to the token rules
> and
> reduce the number of token productions anyway.
>
> >
> > ? Production 58 (QNAME) would allow ":foo" as a QName. This is NOT a
> > legal XML QName. If the intention is to permit such constructions,
> > then the use of "QName" may prove confusing to implementors.
>
> ":foo" is legal as is "foo:" and ":" Yes, they are not XML QNames. But
> they are
> so widely referred to as qnames in the semantic web community, it would
> also be
> confusing to invent a new term.
>
> >
> > ? Production 51 (QName) This production causes conflicts in the
> > grammar. I modified the production to "(NCNAME_PREFIX)? COLON (
> > NCNAME1 | NCNAME2 )", which requires something after the COLON and
> > which I believe supports the uses of QName in the grammar.
>
> [*] This is related to the above.
>
> I modifed QNAME (not the grammar rule QName) along the lines suggested.
>
> I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through out.
>
> Aside: NCNAME1 and NCNAME2 are with and without leading "_" because only
> one
> kind is legal for prefixes, but both are local names. qnames can't
> start with _
> because that looks like a blank node. Other fun and games to exclude
> trailing
> dots in qnames as WG decision.
>
> >
> > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical. Note
> > that production 59 (BNODE) is not used and should presumably be
> > dropped.
>
> Removed BNODE - I had changed the name and didn't remove the definition
> in the
> formatting system.
>
> >
> > Thanks,
> >
> > -bryan
> >
>
> Thanks for the feedback. I'll need to go back and check but with the
> changes I
> described, the grammar passes by syntax tests I have.
>
> Bryan (and anyone else) - do you have any syntax test cases? If so, I'd
> be
> happy to collect them all together, or you can add them to test DAWG
> test suite.
>
> Andy
>
>
Received on Thursday, 17 March 2005 01:01:28 UTC