editorial appendix A stuff

Other things I noticed while working on A-635-03...

I think it's all editorial, so maybe fixing it is more churn than
it's worth at this stage, but I thought I'd point it out anyway.


The productions for PragmaContents and CommentContents should be marked
/* ws: explicit */.

A.1.3 Grammar Notes

It's unclear to me why this section (and the /* gn: id */ annotations)
still exist. E.g., why not ship all the text under "grammar-note: comments"
up to 2.6 Comments? What do we gain by having them separated?

As for "grammar-note: parens", this is just saying "look-ahead is required".
The reader might get the impression that this is the *only* place where
lookahead is required, which it certainly isn't. (and never was?)

A.2 Lexical structure

Para 4 ("Some productions...") and para 5 (XPath's "A host language may
choose..." or XQuery's "It is implementation-defined...") seem to be saying
about the same thing. Moreover, they both appear to be repeating what we say
in A.1.2 (Extra-grammatical Constraints) under "Constraint: xml-version".

A.2.2 Terminal Delimitation

Para 2 says:
     Terminal symbols that are not used exclusively in /* ws: explicit */
     productions are of two kinds: delimiting and non-delimiting.
but the relative clause "that are not used ..." is bogus. It implies that
terminals that *are* used exclusively in ws:explicit productions don't have
to be categorized as delimiting or non-delimiting. However, some of them do.
E.g., the terminal symbol "(#" only appears in the Pragma production, which
is ws:explicit, but you need to know that "(#" is delimiting (or at least,
that it isn't non-delimiting) to determine whether the second space is
required in:
     $x union (#foo#){}
(Note that the list of delimiting terminals does include "(#".)

In the definition of 'non-delimiting terminal symbols', rather than listing
all the keywords, we could just say "and all the keywords", given a
suitable definition of 'keyword'. E.g.:
     A *keyword* is any terminal symbol that appears as a quoted string
     in a production in A.1 EBNF and whose characters (without the quotes)
     match the production for NCName.
(This appears to match the spec's existing uses of the word.)

Para 5 says:
     [Definition: Whitespace and Comments function as *symbol separators*.
     For the most part, they are not mentioned in the grammar, and may occur
     between any two terminal symbols mentioned in the grammar, except where
     that is forbidden by the /* ws: explicit */ annotation in the EBNF,
     or by the /* xgc: xml-version */ annotation.]
Should the definition stop at the first sentence? (If the defn doesn't say
where they're *required*, why should it say where they're *allowed*?)

(Also, I think the mention of xgc:xml-version is unnecessary, since that
annotation doesn't control the gaps between terminal symbols of the A.1

The last para, which specifies where symbol separators are required, would
probably benefit from a rewrite, e.g.:
     Symbol separators are required between two successive terminal symbols
     T and U (where T precedes U) when any of the following is true:
         * T and U are both non-delimiting terminal symbols.
 * T is QName or NCName and U is "." or "-".
 * T is a numeric literal and U is ".", or vice versa.

A.2.4.1 Default Whitespace Handling

Para 2 starts:
     [Definition: *Ignorable whitespace* consists of any whitespace
     characters that may occur between terminals, unless these characters
     occur in the context of a production marked with a ws:explicit
     annotation, in which case they can occur only where explicitly
     specified (see A.2.4.2 Explicit Whitespace Handling).]
This seems to be doing more than a definition should.

Para 2 says
    "Whitespace is allowed between any two terminals."
which conflicts with the more 'nuanced' statement in the quoted defn.

In general, there's duplication between A.2.4.1 and A.2.2.
(E.g., A.2.4.1's "ignorable whitespace" serves the same purpose as
A.2.2's "symbol separators".)



Received on Tuesday, 22 March 2016 21:09:49 UTC