W3C home > Mailing lists > Public > public-qt-comments@w3.org > August 2002

XPath grammar problems and recommendations

From: Ed.Willink <Ed.Willink@uk.thalesgroup.com>
Date: Thu, 29 Aug 2002 12:06:48 -0400 (EDT)
Message-ID: <B0B581DB7EDFD2119DC00090273A97F501D5BFAC@nts006.rrl.co.uk>
To: "'public-qt-comments@w3.org'" <public-qt-comments@w3.org>

The Aug 16 Xpath 2.0 WD is much improved on the 30-Apr WD.

There are however problems. 

instance of attribute

A problem arises when an instnace of is followed by any of

the or/and/for/quantified/if precedence operators

since there is then an ambiguity between

{... instance of attribute operator} ...


{... instance of attribute} operator ...

Perhaps some <> annotations should specify a resolution for the
above, presumably as the normal left maximisation, since parentheses
can be used to force termination between attribute and operator.

However the ambiguity is unplesasant, and could be better
resolved with parentheses in the grammar: instance of (SequenceType).


The grammar for ItemType is trivially impossible since AtomicType is a QName and
so covers node and comment etc which are distinct ItemType's.

This easily implemented by suppressing the bogus matches, but requires a semantic
statement as to whether node etc are valid AtomicTypes. if so, then the QName production
should be corrected to exclude the invalid names. If not then alternate syntax is needed.

Lexical states

I tried implementing the 30-Apr states and filled in the nissing holes, and
then just deleted all the states and fixed the shift reduce conflicts. Just fixing
the conflicts worked more easily for the 16-Aug version. I have no idea whether
my grammar corresponds to the state machines. I suspect it doesn't quite:
"validate" gave me no trouble and I discovered the above problems. CUP gave
a very large number of states, so I doubt that the critical parts really
corresponded to those specified.

It must be wrong that a stanadrd high tech implementation approach is uncheckable
against a specification that struggles to support lower tech approaches. Clearly
the attempt to describe how an LL parser might solve the
problems has introduced a large amount of complicated and potentially erroneous
information. The EBNF with the <> annotations is sufficient to define the grammar,
anything more is dangerous.

I appreciate the desire to avoid prescribing an LALR parser, but they're old
technology and available for Java. By the time you start pushing states
it really gets quite hard to avoid one. Trimming the EBNF to just the necessary,
perhaps with an observation that this is LALR parsable, is sufficient. When someone
has a proven LL approach it could be added, but I suspect that the LL approach is
most easily derived by reverse engineering the LALR states.

It seems ironic that Java which has a solid LALR grammar has undermined the
20 years of yacc/bison with inferior tools such as JavaCC.

The simple CUP and JLex grammars are attached, cluttered by some extra support for
a more readable XSL interface, howeever the xpath_xxx productions are clean.

		Ed Willink

E.D.Willink,                 Email: mailto:Ed.Willink@uk.thalesgroup.com
Thales Research Ltd,                  Tel:   +44 118 923 8278 (direct)
Worton Drive,                          or    +44 118 986 8601 (ext 8278)
Worton Grange Industrial Estate,      Fax:   +44 118 923 8399
Reading,   RG2 0SB
ENGLAND          http://www.computing.surrey.ac.uk/personal/pg/E.Willink
(formerly Racal Research and Thomson-CSF)

As the originator, I grant the addressee permission to divulge
this email and any files transmitted with it to relevant third parties.

This email and any files transmitted with it are intended solely for the use of
the individual or entity to whom they are addressed and may not be divulged to
any third party without the express permission of the originator.  Any views
expressed in this message are those of the individual sender, except where the
sender specifically states them to be the views of Thales Research Ltd.

Received on Friday, 30 August 2002 04:30:59 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:56:43 UTC