- From: Thompson, Bryan B. <BRYAN.B.THOMPSON@saic.com>
- Date: Thu, 17 Mar 2005 03:39:30 -0500
- To: 'Howard Katz ' <howardk@fatdog.com>, "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>, "''Seaborne, Andy ' '" <andy.seaborne@hp.com>, "'public-rdf-dawg-request@w3.org '" <public-rdf-dawg-request@w3.org>
- Cc: '''Eric Prud'hommeaux ' ' ' <eric@w3.org>, "'public-rdf-dawg@w3.org '" <public-rdf-dawg@w3.org>
Howard, Your comments are quite to the point. The problem is very much related to whitespace handling, which leads nicely into two underspecified aspects of the grammar: 1. whitespace handling is not fully disclosed. I believe that there is a tacit assumption that whitespace is absorbed between tokens in the "parser" section and is significant within tokens in the "lexer section. Other W3C specifications that specify grammars, e.g., the XML grammar, do not have as much appearance of being a stripped down grammar from some specific tool. If you look at the XML grammar, you will see that it makes explicit statements concerning whitespace in all productions. One way to say this is that it is entirely expressed at the lexer level. Another way to look at it is that it is less linked to the assumptions of a specific parser generator technology. 2. case sensitivity is not fully disclosed. I have assumed that keywords are case-insensitive based on various examples in the specification, but the lexical rules do not show this and the introduction to the grammar does not spell it out. Is there anything else that is case insensitive? E.g., are prefix names case sensitive? Thanks, -bryan -----Original Message----- From: Howard Katz To: Thompson, Bryan B.; 'Seaborne, Andy '; public-rdf-dawg-request@w3.org Cc: ''Eric Prud'hommeaux ' '; public-rdf-dawg@w3.org Sent: 3/16/2005 8:01 PM Subject: RE: Feedback on Editor's Draft. Bryan, It probably doesn't help you much, but I had problems with qnames in antlr as well in early versions of my XQuery query engine. I too hoisted QNAME into the parser trying to solve lexer difficulties, but if I recall correctly, that then allowed users to enter spaces between the prefix, colon, and localPart! I eventually gave up (for other reasons as well) and eventually moved to javacc. I'm happier now (at least my analyst tells me I should be). You got me curious and I went looking for antlr/QNAME productions. I've been away from antlr so long that the following xquery.g file from eXist just looks like gobbledeegook to me now. If it's useful, more power to you: qName returns [String name] { name= null; String name2; } : ( ncnameOrKeyword COLON ncnameOrKeyword ) => name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword { name= name + ':' + name2; #qName.copyLexInfo(#nc1); } | name=ncnameOrKeyword ; Howard > -----Original Message----- > From: public-rdf-dawg-request@w3.org > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan B. > Sent: Wednesday, March 16, 2005 4:08 PM > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson, > Bryan B. > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' ' > Subject: RE: Feedback on Editor's Draft. > > > > Andy, > > With reference to the QNAME lexical production, the issue revolves > around ambiguity after the ":" in a QNAME. There is ambiguity > between NCNAME1 (in the 17Feb05 working draft production) and pretty > much all of the other lexical tokens, e.g., "select", "union", etc. > This is because the ANTLR-generated parser / lexer is unable to > differentiate between the end of the QNAME and a QNAME that continues > to absorb characters. > > For example: > > foo:select > > could be a QNAME ("foo:") and the keyword "select", or a single > QMAME ( "foo:select" ) We need the parser context in order to > differentiate between these cases. It can't be done in the lexer > alone (or without the use of lexical state, which is pretty much > the same thing). > > I liked the old flex/lex model for managing lexical state from > the parser. ANTLR handles this ... differently. E.g., with > multiplexed token streams and with syntactic predicates for limited > lookahead. > > I have actually hoisted the QNAME production into the parser in order > to get the additional context required to make the parser decisions. > I am currently trying to figure out if I accept ":" as a legal QNAME > in the same fashion or if I need to change it around to use lexical > state (by one mechanism or another). > > If there is any non-implementation specific lesson here, it is that > there are lexer / parser interactions in the SPARQL grammar. It is > my guess that supporting Turtle (when I migrate to the editor's draft) > will identify other such interactions. > > With respect to test cases, I hope to produce some more, but that has > not been my focus at the moment. > > Thanks, > > -bryan > > -----Original Message----- > From: public-rdf-dawg-request@w3.org > To: Thompson, Bryan B. > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org ' > Sent: 3/16/2005 1:35 PM > Subject: Re: Feedback on Editor's Draft. > > > Thompson, Bryan B. wrote: > > Per Andy's request, I started on migration of the parser > implementation > > to the Editor's Draft of SPARQL. I spent the morning on this and I > have > > summarized some questions below that showed up during that time. > However, > > I think that I am going to back off and continue with the last working > > draft as the basis for my continuing efforts since I am more > interested > > in exploring SPARQL semantics, since migrating to the new grammar is > > probably best done by a re-write (if I was really going to vet the > > grammar in the Editor's Draft), and since I don't want to have to > re-vet > > the grammar multiple times as the draft is edited. > > The changes to the grammar should now be limited to anything coming out > of the > sorting discussions. I hope you will continue to provide review and > feedback - > early working group feedback is very helpful. > > > Finally, from the > > perspective of semantics, most syntax changes (e.g., the turtle > syntax) > > are not a big deal and it feels like a lot of effort to track a moving > > document. > > > > That said, I would be happy to do a migration to the Editor's draft > > once it gets into a "feature freeze" state and before it is released > > to last call. At that time I should be able to provide feedback not > > only on the grammar, but also on the semantics. > > > > Some questions on Editor's Draft. > > > > ? Production [3] specifies <SparqlParserBase>, which is not a defined > > lexical production. > > Fixed - a side effect of running cpp over the gramamr with -DBASE=... > :-) which > makes sure UNSAID does not creep back in. > > > > > ? Production [56] (Q_URIRef) appears to have a whitespace character in > > the [^> ] expression so that a whitespace character is not permitted > > within the production. However this is not clear on visual > > inspection of the production. > > ^ is "not" character - that expressions means "not space or >". Spaces > can not > appear in URIs. > > > > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since the > > NCNAME_PREFIX is optional in the grammar. Is this an error? If > > not, it makes the PrefixDecl production ambiguous. > > Simplified to just the first rule. > > > > > ? Production 58 (QNAME) reates an ambiguity in the grammar since QNAME > > permits "<QNAME_NS> :" without any trailing context. This ambiguity > > can be resolved in several ways. For example, by making the "( > > NCNAME1 | NCNAME2 )" production non-optional for QNAME. > > I think this is an ANTLR-ism. Tokenizing in the usual flex/javacc way > with > greedy consumption of input does not have this problem as far as I know. > I have > made a change that should remove it anyway. [*] and see below. > > Aside: as you are using ANTLR, you can either do syntactic or semantic > lookahead > but then you may wish to make more wholesale changes to the token rules > and > reduce the number of token productions anyway. > > > > > ? Production 58 (QNAME) would allow ":foo" as a QName. This is NOT a > > legal XML QName. If the intention is to permit such constructions, > > then the use of "QName" may prove confusing to implementors. > > ":foo" is legal as is "foo:" and ":" Yes, they are not XML QNames. But > they are > so widely referred to as qnames in the semantic web community, it would > also be > confusing to invent a new term. > > > > > ? Production 51 (QName) This production causes conflicts in the > > grammar. I modified the production to "(NCNAME_PREFIX)? COLON ( > > NCNAME1 | NCNAME2 )", which requires something after the COLON and > > which I believe supports the uses of QName in the grammar. > > [*] This is related to the above. > > I modifed QNAME (not the grammar rule QName) along the lines suggested. > > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through out. > > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because only > one > kind is legal for prefixes, but both are local names. qnames can't > start with _ > because that looks like a blank node. Other fun and games to exclude > trailing > dots in qnames as WG decision. > > > > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical. Note > > that production 59 (BNODE) is not used and should presumably be > > dropped. > > Removed BNODE - I had changed the name and didn't remove the definition > in the > formatting system. > > > > > Thanks, > > > > -bryan > > > > Thanks for the feedback. I'll need to go back and check but with the > changes I > described, the grammar passes by syntax tests I have. > > Bryan (and anyone else) - do you have any syntax test cases? If so, I'd > be > happy to collect them all together, or you can add them to test DAWG > test suite. > > Andy > >
Received on Thursday, 17 March 2005 08:39:37 UTC