- From: Thompson, Bryan B. <BRYAN.B.THOMPSON@saic.com>
- Date: Thu, 17 Mar 2005 05:22:01 -0500
- To: "'Seaborne, Andy '" <andy.seaborne@hp.com>, "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>
- Cc: ''''Eric Prud'hommeaux ' ' ' ' <eric@w3.org>, "''public-rdf-dawg@w3.org ' '" <public-rdf-dawg@w3.org>
??? > Keywords are case insensitive (except "a"). -bryan -----Original Message----- From: Seaborne, Andy To: Thompson, Bryan B. Cc: '''Eric Prud'hommeaux ' ' '; 'public-rdf-dawg@w3.org ' Sent: 3/17/2005 5:20 AM Subject: Re: Feedback on Editor's Draft. Thompson, Bryan B. wrote: > Howard, > > Your comments are quite to the point. The problem is very much > related to whitespace handling, which leads nicely into two > underspecified aspects of the grammar: > > 1. whitespace handling is not fully disclosed. I believe that there is a > tacit assumption that whitespace is absorbed between tokens in the > "parser" section and is significant within tokens in the "lexer > section. It is explained before the grammar. > > Other W3C specifications that specify grammars, e.g., the XML grammar, > do not have as much appearance of being a stripped down grammar from > some specific tool. If you look at the XML grammar, you will see that > it makes explicit statements concerning whitespace in all productions. And XQuery takes a different approach again. http://www.w3.org/TR/xquery/#whitespace-rules and uses comments in the EBNF to say where whitespace is not ignored. All the tools I know (ANTLR included : Token.SKIP) have ways to act in this mode. > One way to say this is that it is entirely expressed at the lexer > level. Another way to look at it is that it is less linked to the > assumptions of a specific parser generator technology. It is not linked to a parser generator technology as Eric's work has shown. Otherwise I would just put in the javacc grammar I use for testing. > > 2. case sensitivity is not fully disclosed. I have assumed that > keywords are case-insensitive based on various examples in the > specification, but the lexical rules do not show this and the > introduction to the grammar does not spell it out. Is there > anything else that is case insensitive? E.g., are prefix names > case sensitive? Keywords are case insensitive (except "a"). I'll see that the text for this is visible. Andy > > Thanks, > > -bryan > > -----Original Message----- > From: Howard Katz > To: Thompson, Bryan B.; 'Seaborne, Andy '; public-rdf-dawg-request@w3.org > Cc: ''Eric Prud'hommeaux ' '; public-rdf-dawg@w3.org > Sent: 3/16/2005 8:01 PM > Subject: RE: Feedback on Editor's Draft. > > Bryan, > > It probably doesn't help you much, but I had problems with qnames in > antlr > as well in early versions of my XQuery query engine. I too hoisted QNAME > into the parser trying to solve lexer difficulties, but if I recall > correctly, that then allowed users to enter spaces between the prefix, > colon, and localPart! I eventually gave up (for other reasons as well) > and > eventually moved to javacc. I'm happier now (at least my analyst tells > me I > should be). > > You got me curious and I went looking for antlr/QNAME productions. I've > been > away from antlr so long that the following xquery.g file from eXist just > looks like gobbledeegook to me now. If it's useful, more power to you: > > qName returns [String name] > { > name= null; > String name2; > } > : > ( ncnameOrKeyword COLON ncnameOrKeyword ) > => name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword > { > name= name + ':' + name2; > #qName.copyLexInfo(#nc1); > } > | > name=ncnameOrKeyword > ; > > Howard > > > > -----Original Message----- > > From: public-rdf-dawg-request@w3.org > > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan > B. > > Sent: Wednesday, March 16, 2005 4:08 PM > > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson, > > Bryan B. > > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' ' > > Subject: RE: Feedback on Editor's Draft. > > > > > > > > Andy, > > > > With reference to the QNAME lexical production, the issue revolves > > around ambiguity after the ":" in a QNAME. There is ambiguity > > between NCNAME1 (in the 17Feb05 working draft production) and pretty > > much all of the other lexical tokens, e.g., "select", "union", etc. > > This is because the ANTLR-generated parser / lexer is unable to > > differentiate between the end of the QNAME and a QNAME that continues > > to absorb characters. > > > > For example: > > > > foo:select > > > > could be a QNAME ("foo:") and the keyword "select", or a single > > QMAME ( "foo:select" ) We need the parser context in order to > > differentiate between these cases. It can't be done in the lexer > > alone (or without the use of lexical state, which is pretty much > > the same thing). > > > > I liked the old flex/lex model for managing lexical state from > > the parser. ANTLR handles this ... differently. E.g., with > > multiplexed token streams and with syntactic predicates for limited > > lookahead. > > > > I have actually hoisted the QNAME production into the parser in order > > to get the additional context required to make the parser decisions. > > I am currently trying to figure out if I accept ":" as a legal QNAME > > in the same fashion or if I need to change it around to use lexical > > state (by one mechanism or another). > > > > If there is any non-implementation specific lesson here, it is that > > there are lexer / parser interactions in the SPARQL grammar. It is > > my guess that supporting Turtle (when I migrate to the editor's > draft) > > will identify other such interactions. > > > > With respect to test cases, I hope to produce some more, but that has > > not been my focus at the moment. > > > > Thanks, > > > > -bryan > > > > -----Original Message----- > > From: public-rdf-dawg-request@w3.org > > To: Thompson, Bryan B. > > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org ' > > Sent: 3/16/2005 1:35 PM > > Subject: Re: Feedback on Editor's Draft. > > > > > > Thompson, Bryan B. wrote: > > > Per Andy's request, I started on migration of the parser > > implementation > > > to the Editor's Draft of SPARQL. I spent the morning on this and I > > have > > > summarized some questions below that showed up during that time. > > However, > > > I think that I am going to back off and continue with the last > working > > > draft as the basis for my continuing efforts since I am more > > interested > > > in exploring SPARQL semantics, since migrating to the new grammar > is > > > probably best done by a re-write (if I was really going to vet the > > > grammar in the Editor's Draft), and since I don't want to have to > > re-vet > > > the grammar multiple times as the draft is edited. > > > > The changes to the grammar should now be limited to anything coming > out > > of the > > sorting discussions. I hope you will continue to provide review and > > feedback - > > early working group feedback is very helpful. > > > > > Finally, from the > > > perspective of semantics, most syntax changes (e.g., the turtle > > syntax) > > > are not a big deal and it feels like a lot of effort to track a > moving > > > document. > > > > > > That said, I would be happy to do a migration to the Editor's draft > > > once it gets into a "feature freeze" state and before it is > released > > > to last call. At that time I should be able to provide feedback > not > > > only on the grammar, but also on the semantics. > > > > > > Some questions on Editor's Draft. > > > > > > ? Production [3] specifies <SparqlParserBase>, which is not a > defined > > > lexical production. > > > > Fixed - a side effect of running cpp over the gramamr with -DBASE=... > > :-) which > > makes sure UNSAID does not creep back in. > > > > > > > > ? Production [56] (Q_URIRef) appears to have a whitespace character > in > > > the [^> ] expression so that a whitespace character is not > permitted > > > within the production. However this is not clear on visual > > > inspection of the production. > > > > ^ is "not" character - that expressions means "not space or >". > Spaces > > can not > > appear in URIs. > > > > > > > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since > the > > > NCNAME_PREFIX is optional in the grammar. Is this an error? If > > > not, it makes the PrefixDecl production ambiguous. > > > > Simplified to just the first rule. > > > > > > > > ? Production 58 (QNAME) reates an ambiguity in the grammar since > QNAME > > > permits "<QNAME_NS> :" without any trailing context. This > ambiguity > > > can be resolved in several ways. For example, by making the "( > > > NCNAME1 | NCNAME2 )" production non-optional for QNAME. > > > > I think this is an ANTLR-ism. Tokenizing in the usual flex/javacc > way > > with > > greedy consumption of input does not have this problem as far as I > know. > > I have > > made a change that should remove it anyway. [*] and see below. > > > > Aside: as you are using ANTLR, you can either do syntactic or > semantic > > lookahead > > but then you may wish to make more wholesale changes to the token > rules > > and > > reduce the number of token productions anyway. > > > > > > > > ? Production 58 (QNAME) would allow ":foo" as a QName. This is NOT > a > > > legal XML QName. If the intention is to permit such > constructions, > > > then the use of "QName" may prove confusing to implementors. > > > > ":foo" is legal as is "foo:" and ":" Yes, they are not XML QNames. > But > > they are > > so widely referred to as qnames in the semantic web community, it > would > > also be > > confusing to invent a new term. > > > > > > > > ? Production 51 (QName) This production causes conflicts in the > > > grammar. I modified the production to "(NCNAME_PREFIX)? COLON ( > > > NCNAME1 | NCNAME2 )", which requires something after the COLON > and > > > which I believe supports the uses of QName in the grammar. > > > > [*] This is related to the above. > > > > I modifed QNAME (not the grammar rule QName) along the lines > suggested. > > > > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through > out. > > > > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because > only > > one > > kind is legal for prefixes, but both are local names. qnames can't > > start with _ > > because that looks like a blank node. Other fun and games to exclude > > trailing > > dots in qnames as WG decision. > > > > > > > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical. Note > > > that production 59 (BNODE) is not used and should presumably be > > > dropped. > > > > Removed BNODE - I had changed the name and didn't remove the > definition > > in the > > formatting system. > > > > > > > > Thanks, > > > > > > -bryan > > > > > > > Thanks for the feedback. I'll need to go back and check but with the > > changes I > > described, the grammar passes by syntax tests I have. > > > > Bryan (and anyone else) - do you have any syntax test cases? If so, > I'd > > be > > happy to collect them all together, or you can add them to test DAWG > > test suite. > > > > Andy > > > > >
Received on Thursday, 17 March 2005 10:22:25 UTC