[Bug 1390] New: [XQuery] suggested alternate wording for A.2 and some A.1.1

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1390

           Summary: [XQuery] suggested alternate wording for A.2 and some
                    A.1.1
           Product: XPath / XQuery / XSLT
           Version: Last Call drafts
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: XQuery
        AssignedTo: chamberl@almaden.ibm.com
        ReportedBy: jmdyck@ibiblio.org
         QAContact: public-qt-comments@w3.org


Here is my suggested alternate wording (referred to in several of my recent
comments) for the content of section A.2, and the three 'restrictive'
grammar-notes of A.1.1.

--- It incorporates suggestions from my recent comments.
--- It resolves some of the open-ended comments.
--- It expresses the three 'restrictive' grammar-notes in a more uniform way.
--- It doesn't need the "longest match" rule.
--- It doesn't need to define "terminal" (much less "delimiting" and
    "non-delimiting") or introduce long lists of symbols.
--- It defines the language without invoking parsing, but also includes remarks
    about how the language design affects parser construction.
--- It is generally more precise and concise (except for the syntax trees!), and
    yet (I think) it's also more readable, because it proceeds in a more logical
    fashion.

----------------------------------------

It needs this production added to A.1 EBNF:
    [154] Filler ::= ( #x20 | #x9 | #xD | #xA | Comment )+ /* ws: explicit */
(Alternatively,
    [154] Filler ::= ( S | Comment )+  /* ws: explicit */
might work.)


A.2 Whitespace Characters and Filler

    A 'whitespace character' is any of the characters referenced in the
    right-hand-side of [...#NT-S].

    In terms of this grammar, there are two mechanisms by which whitespace
    characters can appear in queries: explicit and implicit.

    Explicit: In the EBNF, there are places where whitespace characters are
    specifically allowed, either via references to the symbols S or Char, or
    via the notation [^abc] (which implicitly involves Char). (Note that
    comments are *not* allowed via this mechanism.)

    For instance, whitespace characters are specified explicitly in the
    productions for direct constructors, in order to be more consistent with
    the corresponding constructs in the XML grammar.

    Implicit: Whitespace characters and comments (collectively known as
    "filler") can be used in most expressions even though not explicitly
    allowed by the EBNF.  Specifically, for each production that is not
    marked 'ws: explicit', filler is allowed between any two symbols that
    the production directly derives.

    For example,
        [51] MultiplicativeExpr ::=
                        UnionExpr ( ("*" | "div" | "idiv" | "mod") UnionExpr )*
    directly derives infinitely many finite sequences of symbols, one of
    which is
        UnionExpr "*" UnionExpr "div" UnionExpr
    Filler is allowed in the 4 'gaps' between these 5 symbols. In effect,
    the derivation step would result in:
        UnionExpr Filler? "*" Filler? UnionExpr Filler? "div" Filler? UnionExpr

    Note that this interpolation of filler only applies to one derivation step,
    the one directly involving the production in question; subsequent derivation
    steps will use other productions, which might have a 'ws: explicit' marking
    (and thus would *not* interpolate filler between symbols).

    Filler is also allowed at the start and end of a module.

A.3 Culls

    The preceding two sections define a language containing all syntactically
    legal queries. However, it also includes some syntax trees / queries which
    we decree to be illegal.  We do this in order to either eliminate ambiguity
    or make parsing easier.  This section specifies those illegal queries.

    Note that when we say a syntax tree is illegal, this doesn't necessarily
    mean say that the resulting query (i.e., sequence of characters) is illegal.
    It's possible that there is a legal syntax tree resulting in the same query,
    in which case the query is legal. There's an example of this later.

    Definition: A 'keyword' is a symbol that appears in the EBNF as a quoted
    string, such that the characters inside the quotes conform to the syntax
    of an NCName (e.g., "while", "preceding-sibling").

    (1)
    In two cases, the filler that is merely 'allowed' in A.2 is required; i.e.,
    it is illegal for the interpolated 'Filler?' to derive the empty string.
    The cases are specified by the presence, in the syntax tree, of certain
    symbols to the left and right of the empty filler:

    (a) a keyword on one side (either side), and a keyword, QName, NCName,
        NumericLiteral, or StringLiteral on the other.

        For instance, consider the abbreviated syntax tree

                                   Expr
                                    |
                           MultiplicativeExpr
                                    |
                  +---------+-------+-------+--------+
                  |         |       |       |        |
              UnionExpr  Filler?  "div"  Filler?  UnionExpr
                  |         |       |       |        |
            NumericLiteral  |       |       |  NumericLiteral
                  |         |       |       |        |
                  ++        |      +++      |        |
                  ||        |      |||      |        |
                  10       [?]     div     [?]       3

        The "div" is a keyword, so in both filler positions (bounded by
        NumericLiteral/keyword and keyword/NumericLiteral respectively), it is
        illegal to have empty filler. This leads to the conclusion that
            10 div 3
            10 div(:comment:)3
        (for instance) are legal queries resulting from this tree, but
            10div 3
            10 div3
            10div3
        are illegal queries.

    (b) a keyword, QName, or NCName on the left, and "-" or "." on the
        right.

        For instance, consider the abbreviated syntax tree

                              Expr
                               |
                         AdditiveExpr
                               |
                +-------+------+------+-------+
                |       |      |      |       |
            MultExpr Filler?  "-"  Filler? MultExpr
                |       |      |      |       |
              QName     |      |      |     QName
                |       |      |      |       |
               +++      |      |      |      +++
               |||      |      |      |      |||
               foo     [?]     -     [?]     foo

        In the first filler position (bounded by QName/"-"), it is illegal
        to have empty filler. In the second position (bounded by "-"/QName),
        there is no such constraint. Thus,
            foo - foo
            foo -foo
            foo(: comment :)- foo
            foo(: comment :)-foo
        are all legal queries resulting from this tree, but
            foo- foo
        is illegal.

        The query
            foo-foo
        is illegal as far as *this* tree is concerned, but it happens to be
        legal via a different syntax tree:

              Expr
               |
             QName
               |
            +++++++
            |||||||
            foo-foo

        because hyphen is a valid name character.

    (2)
    [grammar-note: occurrence-indicators]

    Consider these abbreviated syntax trees:

                                        AdditiveExpr
                                             |
                        +--------------------+----------+-----------+
                        |                               |           |
                MultiplicativeExpr                     "-"  MultiplicativeExpr
                        |                               |           |
                     TreatExpr                          |           |
                        |                               |           |
         +-----------+--+---+-------------+             |           |
         |           |      |             |             |           |
    CastableExpr  "treat"  "as"     SequenceType        |           |
         |           |      |             |             |           |
         |           |      |        +----+----+        |           |
         |           |      |        |         |        |           |
         |           |      |    ItemType OccIndicator  |           |
         |           |      |        |         |        |           |
         4         treat    as     item()      +        -           5


                                        AdditiveExpr
                                             |
                        +--------------------+----+-----------+
                        |                         |           |
                MultiplicativeExpr               "+"  MultiplicativeExpr
                        |                         |           |
                     TreatExpr                    |        UnaryExpr
                        |                         |           |
         +-----------+--+---+-------------+       |       +-------+
         |           |      |             |       |       |       |
    CastableExpr  "treat"  "as"     SequenceType  |      "-"  ValueExpr
         |           |      |             |       |       |       |
         |           |      |             |       |       |       |
         |           |      |             |       |       |       |
         |           |      |         ItemType    |       |       |
         |           |      |             |       |       |       |
         4         treat    as          item()    +       -       5

    This illustrates an ambiguity in the EBNF, which is resolved by making the
    second tree illegal.  Specifically, a syntax tree is illegal if an ItemType
    is followed by a "+", "*", or "?" that is *not* an OccurrenceIndicator.
    (The presence or absence of filler makes no difference to the illegality.)

    (Thus, a parser, having recognized an ItemType, and seeing a "+", "*", or
    "?", can be certain that the latter is an OccurrenceIndicator.)

    Note that the elimination of such trees does not create a semantic hole in
    the language: one can easily construct a legal query that is semantically
    equivalent to one of these illegal trees, simply by parenthesizing the
    expression that most closely contains the ItemType. (The illegal trees can
    only occur when the ItemType is within an expression.)

    (3)
    [grammar-note: leading-lone-slash]
    Consider the two abbreviated syntax trees:

              Expr                            Expr
               |                               |
            PathExpr                  MultiplicativeExpr
               |                               |
         +-----+----+                  +-------+-------+
         |          |                  |       |       |
        "/"  RelativePathExpr      UnionExpr  "*"  UnionExpr
         |          |                  |       |       |
         |       StepExpr           PathExpr   |       |
         |          |                  |       |       |
         |       Wildcard             "/"      |       |
         |          |                  |       |       |
         /          *                  /       *       5

    Although there isn't an ambiguity, some parsers would have trouble
    distinguishing between these two alternatives.  Similarly with:

              Expr                              Expr
               |                                 |
            PathExpr                         UnionExpr
               |                                 |
         +-----+----+                  +---------+---------+
         |          |                  |         |         |
        "/"  RelativePathExpr      IntExExpr  "union"  IntExExpr
         |          |                  |         |         |
         |       StepExpr           PathExpr     |         |
         |          |                  |         |         |
         |        QName               "/"        |         |
         |          |                  |         |         |
         /        union                /       union      $x

    Therefore, in each pair, the tree on the right is deemed illegal.
    Specifically, a syntax tree is illegal if it contains:
        -- a PathExpr that only derives "/"
        followed by:
        -- "*" or a keyword.

    (Thus, a parser, seeing (in an expression context) a slash followed by a
    star or what-could-be-a-keyword, can be certain that the slash is not a
    complete PathExpr, but rather that start of a PathExpr.)

    Again, there is no semantic hole: one can easily construct a legal query
    that is semantically equivalent to one of these illegal trees, simply by
    putting the "lone slash" in parentheses.

    (4)
    [grammar-note: reserved-function-names]

    Consider these abbreviated syntax trees:

                 Expr                              Expr
                  |                                 |
                IfExpr                         FunctionCall
                  |                                 |
      +----+----+-----+-----+--....     +-----+-----+-+--------+
      |    |    |     |     |           |     |       |        |
    "if"  "("  Expr  ")"  "then"      QName  "("  ExprSingle  ")"
      |    |    |     |     |           |     |       |        |
     if    (   foo    )    then         if    (      foo       )

    Although there isn't an ambiguity, some parsers would be unable to
    distinguish between these two alternatives.  Therefore, the right-hand tree
    is deemed illegal.  Specifically, a syntax tree is illegal if it contains a
    FunctionCall whose QName:
    -- does not have a prefix, and
    -- has a local-part that is one of the following NCNames:
           attribute
           comment
           etc

    (Thus, a parser, seeing (in an expression context) one of those words
    followed by a open parenthesis, can be certain that it is not the start of
    a FunctionCall.)

    [Way to construct semantically equivalent query?]

Received on Wednesday, 11 May 2005 07:45:48 UTC