[XQuery] A.2.1 White Space Rules

XQuery 1.0: An XML Query Language
W3C Working Draft 12 November 2003

A.2.1 White Space Rules

You should probably define what you mean by "white space", since it
isn't defined in the EBNF.

"White space is tolerated before the first token and after the last
token."
    The word "tolerated" is odd here. Perhaps change to "allowed".

    At the end of the sentence, append "of a module".

"White space is optional between terminals, except a few cases where
white space is needed"
    You must specify which cases.

    Also, after "except", I think you need to insert "for" or "in".

"to disambiguate the token."
    This is a misuse of the term "disambiguate" in its technical sense.
    Ambiguity (or the lack thereof) is a property of a grammar.
    Change "token" to "grammar".

------------------------------------

"Special white space notation is specified with the EBNF productions,
when it is different from the default rules,"
    It's not clear what the default rules are. I think they started at
    "White space is optional". I believe it would increase clarity if
    you created a "Whitespace: default" subsection, and put the
    "default rules" under it.

"'ws: significant' means that white space is significant as value
content."
    As I understand it, this has nothing to do with specifying where
    white space is allowed: you could replace every "ws: significant"
    with "ws: explicit" and the set of legal queries would be the same.
    Specifying what is "significant as value content" is out of place
    here. (For instance, whether boundary whitespace is significant is
    controlled by the xmlspace declaration.)

I don't think the white space rules are precise enough to tell me
whether white space is allowed/disallowed/required between two terminals
that are derived from productions with different ws annotations (e.g.,
one "ws: explicit", one default).

------------------------------------

"For XQuery,"
    Delete.

"White space is not freely allowed in the non-computed Constructor
productions, but is specified explicitly in the grammar ..."
    Change "non-computed" to "direct".

    This sentence is unnecessary, since the corresponding productions
    have the appropriate "ws" annotations. (It's a holdover from the
    days before "ws" annotations.)

"The lexical states where white space must have explicit specification
are as follows: ..."
    If you're talking about the states that have an explicit transition
    on white space (or on a symbol that can derive a whitespace
    character), then:

    --- The use of "must" is inappropriate, since it's not the
        implementor's job to ensure that you specify these transitions.

    --- Why is PROCESSING_INSTRUCTION included in the list?

    --- Why is EXPR_COMMENT excluded?

    If you're talking about something else, does it affect the
    interpretation of A.2.2?

    In either case, the sentence should probably be moved to A.2.2, or
    else deleted.

------------------------------------

"For other usage of white space,"
    Other than what? Aren't all uses of white space covered by either
    "the default rules", "ws: explicit", or "ws: significant"?

"one or more white space characters are required to separate 'words'."
    What constitute "words"?

"Zero or more white space characters may optionally be used"
    Given "zero or more", "optionally" is redundant.

"around punctuation and non-word symbols."
    What constitues "punctuation"? or "non-word symbols"?

In sum, this paragraph is vague and unhelpful, and could probably be
construed to conflict with other requirements. Either delete it or make
it more precise and better related to the rest of the section.

------------------------------------

Presumably, white space is disallowed anywhere that this section doesn't
say it *is* allowed. (If that's the case, it would probably be good to
mention it.) In particular, it would appear that white space is
disallowed *within* terminals (or at least, those that are derived from
a production without a "ws" annotation). Normally, this is sensible, but
it has some odd (and probably unintended) consequences:

    --- Because SchemaGlobalTypeName is a terminal, constructs such as
            type( schedule )
        or
            type (schedule)
        are illegal.

    --- Because Pragma and MUExtension are terminals, the spaces that
        appear around "pragma" and "extension" in the spec's examples
        of these constructs are illegal.

    --- All four "ws: explicit" annotations in the "Named Terminals"
        section are redundant.

There are various ways to deal with these cases, but I think the root
of the problem is defining the allowed locations for white space in
terms of "terminals" (or "tokens"), which I think is unnecessary.

------------------------------------------------------------------------

The spec is inconsistent: "white space" or "whitespace"?

------------------------------------------------------------------------

Also, the following sections:
    2.6.5 Pragmas
    2.6.6 Must-Understand Extensions
    3.1.6 XQuery Comments
    A.1.1 Grammar Notes (grammar-note: comments)
    E Glossary (must understand)
all refer to "ignorable whitespace", but this term is never defined.

-Michael Dyck

Received on Wednesday, 11 February 2004 05:53:51 UTC