RE: Feedback on Editor's Draft. from Thompson, Bryan B. on 2005-03-17 (public-rdf-dawg@w3.org from January to March 2005)

From: Thompson, Bryan B. <BRYAN.B.THOMPSON@saic.com>
Date: Thu, 17 Mar 2005 03:39:30 -0500
To: 'Howard Katz ' <howardk@fatdog.com>, "Thompson, Bryan B." <BRYAN.B.THOMPSON@saic.com>, "''Seaborne, Andy ' '" <andy.seaborne@hp.com>, "'public-rdf-dawg-request@w3.org '" <public-rdf-dawg-request@w3.org>
Cc: '''Eric Prud'hommeaux ' ' ' <eric@w3.org>, "'public-rdf-dawg@w3.org '" <public-rdf-dawg@w3.org>
Message-Id: <D24D16A6707B0A4B9EF084299CE99B3912CB4709@mcl-its-exs02.mail.saic.com>
Howard,

Your comments are quite to the point.  The problem is very much
related to whitespace handling, which leads nicely into two
underspecified aspects of the grammar:

1. whitespace handling is not fully disclosed.  I believe that there is a
   tacit assumption that whitespace is absorbed between tokens in the
   "parser" section and is significant within tokens in the "lexer
   section.

   Other W3C specifications that specify grammars, e.g., the XML grammar,
   do not have as much appearance of being a stripped down grammar from
   some specific tool.  If you look at the XML grammar, you will see that
   it makes explicit statements concerning whitespace in all productions.
   One way to say this is that it is entirely expressed at the lexer
   level.  Another way to look at it is that it is less linked to the
   assumptions of a specific parser generator technology.

2. case sensitivity is not fully disclosed.  I have assumed that
   keywords are case-insensitive based on various examples in the
   specification, but the lexical rules do not show this and the
   introduction to the grammar does not spell it out.  Is there
   anything else that is case insensitive?  E.g., are prefix names
   case sensitive?

Thanks,

-bryan
   
-----Original Message-----
From: Howard Katz
To: Thompson, Bryan B.; 'Seaborne, Andy '; public-rdf-dawg-request@w3.org
Cc: ''Eric Prud'hommeaux ' '; public-rdf-dawg@w3.org
Sent: 3/16/2005 8:01 PM
Subject: RE: Feedback on Editor's Draft.

Bryan,

It probably doesn't help you much, but I had problems with qnames in
antlr
as well in early versions of my XQuery query engine. I too hoisted QNAME
into the parser trying to solve lexer difficulties, but if I recall
correctly, that then allowed users to enter spaces between the prefix,
colon, and localPart! I eventually gave up (for other reasons as well)
and
eventually moved to javacc. I'm happier now (at least my analyst tells
me I
should be).

You got me curious and I went looking for antlr/QNAME productions. I've
been
away from antlr so long that the following xquery.g file from eXist just
looks like gobbledeegook to me now. If it's useful, more power to you:

qName returns [String name]
{
	name= null;
	String name2;
}
:
	( ncnameOrKeyword COLON ncnameOrKeyword )
	=> name=nc1:ncnameOrKeyword COLON name2=ncnameOrKeyword
	{
		name= name + ':' + name2;
		#qName.copyLexInfo(#nc1);
	}
	|
	name=ncnameOrKeyword
	;

Howard


 > -----Original Message-----
 > From: public-rdf-dawg-request@w3.org
 > [mailto:public-rdf-dawg-request@w3.org]On Behalf Of Thompson, Bryan
B.
 > Sent: Wednesday, March 16, 2005 4:08 PM
 > To: 'Seaborne, Andy '; 'public-rdf-dawg-request@w3.org '; Thompson,
 > Bryan B.
 > Cc: ''Eric Prud'hommeaux ' '; ''public-rdf-dawg@w3.org ' '
 > Subject: RE: Feedback on Editor's Draft.
 >
 >
 >
 > Andy,
 >
 > With reference to the QNAME lexical production, the issue revolves
 > around ambiguity after the ":" in a QNAME.  There is ambiguity
 > between NCNAME1 (in the 17Feb05 working draft production) and pretty
 > much all of the other lexical tokens, e.g., "select", "union", etc.
 > This is because the ANTLR-generated parser / lexer is unable to
 > differentiate between the end of the QNAME and a QNAME that continues
 > to absorb characters.
 >
 > For example:
 >
 >  foo:select
 >
 > could be a QNAME ("foo:") and the keyword "select", or a single
 > QMAME ( "foo:select" )  We need the parser context in order to
 > differentiate between these cases.  It can't be done in the lexer
 > alone (or without the use of lexical state, which is pretty much
 > the same thing).
 >
 > I liked the old flex/lex model for managing lexical state from
 > the parser.  ANTLR handles this ... differently.  E.g., with
 > multiplexed token streams and with syntactic predicates for limited
 > lookahead.
 >
 > I have actually hoisted the QNAME production into the parser in order
 > to get the additional context required to make the parser decisions.
 > I am currently trying to figure out if I accept ":" as a legal QNAME
 > in the same fashion or if I need to change it around to use lexical
 > state (by one mechanism or another).
 >
 > If there is any non-implementation specific lesson here, it is that
 > there are lexer / parser interactions in the SPARQL grammar.  It is
 > my guess that supporting Turtle (when I migrate to the editor's
draft)
 > will identify other such interactions.
 >
 > With respect to test cases, I hope to produce some more, but that has
 > not been my focus at the moment.
 >
 > Thanks,
 >
 > -bryan
 >
 > -----Original Message-----
 > From: public-rdf-dawg-request@w3.org
 > To: Thompson, Bryan B.
 > Cc: 'Eric Prud'hommeaux '; 'public-rdf-dawg@w3.org '
 > Sent: 3/16/2005 1:35 PM
 > Subject: Re: Feedback on Editor's Draft.
 >
 >
 > Thompson, Bryan B. wrote:
 > > Per Andy's request, I started on migration of the parser
 > implementation
 > > to the Editor's Draft of SPARQL.  I spent the morning on this and I
 > have
 > > summarized some questions below that showed up during that time.
 > However,
 > > I think that I am going to back off and continue with the last
working
 > > draft as the basis for my continuing efforts since I am more
 > interested
 > > in exploring SPARQL semantics, since migrating to the new grammar
is
 > > probably best done by a re-write (if I was really going to vet the
 > > grammar in the Editor's Draft), and since I don't want to have to
 > re-vet
 > > the grammar multiple times as the draft is edited.
 >
 > The changes to the grammar should now be limited to anything coming
out
 > of the
 > sorting discussions.  I hope you will continue to provide review and
 > feedback -
 > early working group feedback is very helpful.
 >
 >  > Finally, from the
 > > perspective of semantics, most syntax changes (e.g., the turtle
 > syntax)
 > > are not a big deal and it feels like a lot of effort to track a
moving
 > > document.
 > >
 > > That said, I would be happy to do a migration to the Editor's draft
 > > once it gets into a "feature freeze" state and before it is
released
 > > to last call.  At that time I should be able to provide feedback
not
 > > only on the grammar, but also on the semantics.
 > >
 > > Some questions on Editor's Draft.
 > >
 > > ? Production [3] specifies <SparqlParserBase>, which is not a
defined
 > >   lexical production.
 >
 > Fixed - a side effect of running cpp over the gramamr with -DBASE=...
 > :-) which
 > makes sure UNSAID does not creep back in.
 >
 > >
 > > ? Production [56] (Q_URIRef) appears to have a whitespace character
in
 > >   the [^> ] expression so that a whitespace character is not
permitted
 > >   within the production.  However this is not clear on visual
 > >   inspection of the production.
 >
 > ^ is "not" character - that expressions means "not space or >".
Spaces
 > can not
 > appear in URIs.
 >
 > >
 > > ? Production 57 (QNAME_NS) permits ":" as a valid QNAME_NS since
the
 > >   NCNAME_PREFIX is optional in the grammar.  Is this an error?  If
 > >   not, it makes the PrefixDecl production ambiguous.
 >
 > Simplified to just the first rule.
 >
 > >
 > > ? Production 58 (QNAME) reates an ambiguity in the grammar since
QNAME
 > >   permits "<QNAME_NS> :" without any trailing context.  This
ambiguity
 > >   can be resolved in several ways.  For example, by making the "(
 > >   NCNAME1 | NCNAME2 )" production non-optional for QNAME.
 >
 > I think this is an ANTLR-ism.  Tokenizing in the usual flex/javacc
way
 > with
 > greedy consumption of input does not have this problem as far as I
know.
 > I have
 > made a change that should remove it anyway. [*] and see below.
 >
 > Aside: as you are using ANTLR, you can either do syntactic or
semantic
 > lookahead
 > but then you may wish to make more wholesale changes to the token
rules
 > and
 > reduce the number of token productions anyway.
 >
 > >
 > > ? Production 58 (QNAME) would allow ":foo" as a QName.  This is NOT
a
 > >   legal XML QName.  If the intention is to permit such
constructions,
 > >   then the use of "QName" may prove confusing to implementors.
 >
 > ":foo" is legal as is "foo:" and ":"  Yes, they are not XML QNames.
But
 > they are
 > so widely referred to as qnames in the semantic web community, it
would
 > also be
 > confusing to invent a new term.
 >
 > >
 > > ? Production 51 (QName) This production causes conflicts in the
 > >   grammar.  I modified the production to "(NCNAME_PREFIX)? COLON (
 > >   NCNAME1 | NCNAME2 )", which requires something after the COLON
and
 > >   which I believe supports the uses of QName in the grammar.
 >
 > [*] This is related to the above.
 >
 > I modifed QNAME (not the grammar rule QName) along the lines
suggested.
 >
 > I defined token NCNAME as (NCNAME1 | NCNAME2) and used that through
out.
 >
 > Aside: NCNAME1 and NCNAME2 are with and without leading "_" because
only
 > one
 > kind is legal for prefixes, but both are local names.  qnames can't
 > start with _
 > because that looks like a blank node.  Other fun and games to exclude
 > trailing
 > dots in qnames as WG decision.
 >
 > >
 > > ? Productions 59 (BNODE) and 60 (BNODE_LABEL) are identical.  Note
 > >   that production 59 (BNODE) is not used and should presumably be
 > >   dropped.
 >
 > Removed BNODE - I had changed the name and didn't remove the
definition
 > in the
 > formatting system.
 >
 > >
 > > Thanks,
 > >
 > > -bryan
 > >
 >
 > Thanks for the feedback. I'll need to go back and check but with the
 > changes I
 > described, the grammar passes by syntax tests I have.
 >
 > Bryan (and anyone else) - do you have any syntax test cases?  If so,
I'd
 > be
 > happy to collect them all together, or you can add them to test DAWG
 > test suite.
 >
 > 	Andy
 >
 >
Received on Thursday, 17 March 2005 08:39:37 UTC