- From: Howard Katz <howardk@fatdog.com>
- Date: Wed, 16 Mar 2005 17:01:08 -0800
- To: "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>, "'Seaborne, Andy '" <andy.seaborne@hp.com>, <public-rdf-dawg-request@w3.org>
- Cc: "''Eric Prud'hommeaux ' '" <eric@w3.org>, <public-rdf-dawg@w3.org>
Bryan, It probably doesn't help you much, but I had problems with qnames in antlr as well in early versions of my XQuery query engine. I too hoisted QNAME into the parser trying to solve lexer difficulties, but if I recall correctly, that then allowed users to enter spaces between the prefix, colon, and localPart! I eventually gave up (for other reasons as well) and eventually moved to javacc. I'm happier now (at least my analyst tells me I should be). You got me curious and I went looking for antlr/QNAME productions. I've been away from antlr so long that the following xquery.g file from eXist just looks like gobbledeegook to me now. If it's useful, more power to you: qName returns [String name] { name= null; String name2; } : ( ncnameOrKeyword COLON ncnameOrKeyword ) => name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword { name= name + ':' + name2; #qName.copyLexInfo(#nc1); } | name=ncnameOrKeyword ; Howard > -----Original Message----- > From: public-rdf-dawg-request@w3.org > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan B. > Sent: Wednesday, March 16, 2005 4:08 PM > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson, > Bryan B. > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' ' > Subject: RE: Feedback on Editor's Draft. > > > > Andy, > > With reference to the QNAME lexical production, the issue revolves > around ambiguity after the ":" in a QNAME. There is ambiguity > between NCNAME1 (in the 17Feb05 working draft production) and pretty > much all of the other lexical tokens, e.g., "select", "union", etc. > This is because the ANTLR-generated parser / lexer is unable to > differentiate between the end of the QNAME and a QNAME that continues > to absorb characters. > > For example: > > foo:select > > could be a QNAME ("foo:") and the keyword "select", or a single > QMAME ( "foo:select" ) We need the parser context in order to > differentiate between these cases. It can't be done in the lexer > alone (or without the use of lexical state, which is pretty much > the same thing). > > I liked the old flex/lex model for managing lexical state from > the parser. ANTLR handles this ... differently. E.g., with > multiplexed token streams and with syntactic predicates for limited > lookahead. > > I have actually hoisted the QNAME production into the parser in order > to get the additional context required to make the parser decisions. > I am currently trying to figure out if I accept ":" as a legal QNAME > in the same fashion or if I need to change it around to use lexical > state (by one mechanism or another). > > If there is any non-implementation specific lesson here, it is that > there are lexer / parser interactions in the SPARQL grammar. It is > my guess that supporting Turtle (when I migrate to the editor's draft) > will identify other such interactions. > > With respect to test cases, I hope to produce some more, but that has > not been my focus at the moment. > > Thanks, > > -bryan > > -----Original Message----- > From: public-rdf-dawg-request@w3.org > To: Thompson, Bryan B. > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org ' > Sent: 3/16/2005 1:35 PM > Subject: Re: Feedback on Editor's Draft. > > > Thompson, Bryan B. wrote: > > Per Andy's request, I started on migration of the parser > implementation > > to the Editor's Draft of SPARQL. I spent the morning on this and I > have > > summarized some questions below that showed up during that time. > However, > > I think that I am going to back off and continue with the last working > > draft as the basis for my continuing efforts since I am more > interested > > in exploring SPARQL semantics, since migrating to the new grammar is > > probably best done by a re-write (if I was really going to vet the > > grammar in the Editor's Draft), and since I don't want to have to > re-vet > > the grammar multiple times as the draft is edited. > > The changes to the grammar should now be limited to anything coming out > of the > sorting discussions. I hope you will continue to provide review and > feedback - > early working group feedback is very helpful. > > > Finally, from the > > perspective of semantics, most syntax changes (e.g., the turtle > syntax) > > are not a big deal and it feels like a lot of effort to track a moving > > document. > > > > That said, I would be happy to do a migration to the Editor's draft > > once it gets into a "feature freeze" state and before it is released > > to last call. At that time I should be able to provide feedback not > > only on the grammar, but also on the semantics. > > > > Some questions on Editor's Draft. > > > > ? Production [3] specifies <SparqlParserBase>, which is not a defined > > lexical production. > > Fixed - a side effect of running cpp over the gramamr with -DBASE=... > :-) which > makes sure UNSAID does not creep back in. > > > > > ? Production [56] (Q_URIRef) appears to have a whitespace character in > > the [^> ] expression so that a whitespace character is not permitted > > within the production. However this is not clear on visual > > inspection of the production. > > ^ is "not" character - that expressions means "not space or >". Spaces > can not > appear in URIs. > > > > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since the > > NCNAME_PREFIX is optional in the grammar. Is this an error? If > > not, it makes the PrefixDecl production ambiguous. > > Simplified to just the first rule. > > > > > ? Production 58 (QNAME) reates an ambiguity in the grammar since QNAME > > permits "<QNAME_NS> :" without any trailing context. This ambiguity > > can be resolved in several ways. For example, by making the "( > > NCNAME1 | NCNAME2 )" production non-optional for QNAME. > > I think this is an ANTLR-ism. Tokenizing in the usual flex/javacc way > with > greedy consumption of input does not have this problem as far as I know. > I have > made a change that should remove it anyway. [*] and see below. > > Aside: as you are using ANTLR, you can either do syntactic or semantic > lookahead > but then you may wish to make more wholesale changes to the token rules > and > reduce the number of token productions anyway. > > > > > ? Production 58 (QNAME) would allow ":foo" as a QName. This is NOT a > > legal XML QName. If the intention is to permit such constructions, > > then the use of "QName" may prove confusing to implementors. > > ":foo" is legal as is "foo:" and ":" Yes, they are not XML QNames. But > they are > so widely referred to as qnames in the semantic web community, it would > also be > confusing to invent a new term. > > > > > ? Production 51 (QName) This production causes conflicts in the > > grammar. I modified the production to "(NCNAME_PREFIX)? COLON ( > > NCNAME1 | NCNAME2 )", which requires something after the COLON and > > which I believe supports the uses of QName in the grammar. > > [*] This is related to the above. > > I modifed QNAME (not the grammar rule QName) along the lines suggested. > > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through out. > > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because only > one > kind is legal for prefixes, but both are local names. qnames can't > start with _ > because that looks like a blank node. Other fun and games to exclude > trailing > dots in qnames as WG decision. > > > > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical. Note > > that production 59 (BNODE) is not used and should presumably be > > dropped. > > Removed BNODE - I had changed the name and didn't remove the definition > in the > formatting system. > > > > > Thanks, > > > > -bryan > > > > Thanks for the feedback. I'll need to go back and check but with the > changes I > described, the grammar passes by syntax tests I have. > > Bryan (and anyone else) - do you have any syntax test cases? If so, I'd > be > happy to collect them all together, or you can add them to test DAWG > test suite. > > Andy > >
Received on Thursday, 17 March 2005 01:01:28 UTC