- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 11 May 2005 07:39:45 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1385
Summary: [XQuery] some editorial comments on A.2.2.1 Default
Whitespace Handling
Product: XPath / XQuery / XSLT
Version: Last Call drafts
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: XQuery
AssignedTo: chamberl@almaden.ibm.com
ReportedBy: jmdyck@ibiblio.org
QAContact: public-qt-comments@w3.org
A.2.2.1 Default Whitespace Handling
[See a later comment for suggested alternate wording.]
(making it all explicit)
In http://www.w3.org/2005/04/xquery-issues.html#qt-2004Feb0853-01,
Steven Buxton suggested "that you give up on implicit whitespace rules
in the EBNF, and go with totally explicit whitespace in every
EBNF." Apparently the proposal was accepted. And yet the proposed change
did not occur. What happened?
"[Definition: Whitespace characters are defined by [http:...#NT-S]"
Put "characters" in bold, because you're defining "whitespace characters",
not "whitespace".
Maybe put it in the singular: "A 'whitespace character' is any of the
characters referenced in the right-hand-side of [...#NT-S]."
"when these characters occur outside of a StringLiteral.]"
I think this exception is unnecessary. Consider that there isn't an
exception for QuotAttrValueContent, DirElemContent, etc.
"Ignorable"
Change to lower-case "i".
"Unless otherwise specified ..., Ignorable whitespace may occur between
terminals,"
This is not a definition. The real definition comes later.
It isn't clear how these two phrases relate. That is, given two adjacent
terminals, how does one determine whether whitespace may be inserted between
them, i.e., whether Default or Explicit Whitespace Handling applies?
For example, in the query
<a>{ "hello" }{ "world" }</a>
consider the two terminals '}' and '{' in the middle. They both come from
(different applications of) the EnclosedExpr production, which is not marked
with 'ws: explicit', and so is subject to Default Whitespace Handling.
However, you presumably don't want to suggest that ignorable whitespace can
be inserted between these two terminals. Instead, what I imagine you have in
mind is that a pair of successive terminals is governed by their nearest
common ancestor in the syntax tree. In the above example, that's a
DirElemConstructor, which symbol/production *is* marked 'ws: explicit', so
ignorable whitespace cannot be inserted. However, as I say, it isn't clear
that this is the intent.
"and is not significant to the parse tree"
Well, that's a bit tricky, since the presence/absence of whitespace can
certainly be significant to the resulting parse tree ('a-b' vs 'a - b').
"For readability, whitespace may be used..."
This certainly doesn't belong in a definition.
"All allowable whitespace that is not explicitly specified in the EBNF is
ignorable whitespace, and converse, this term does not apply to whitespace that
is explicitly specified. ]
Change "converse" to "conversely".
Delete space before right paren.
You could simplify it by saying
"Ignorable whitespace is any allowable whitespace that is not explicitly
specified in the EBNF."
(Now that's a definition.)
However, the phrase "allowable whitespace" is not defined. (In fact, this is
the only occurrence of the word "allowable" in the whole spec.) You could
delete it; the "not explicitly specified" phrase is doing the real work.
"Whitespace is allowed before the first terminal and after the last terminal of
an expression module."
Change "an expression module" to just "a module".
"Whitespace is optional between delimiting terminals."
Change "optional" to "allowed".
You missed a case: Whitespace is allowed between a delimiting terminal and
a non-delimiting terminal (in either order). It would be simpler to just
say "Whitespace is allowed between any two terminals."
(that whole paragraph)
This paragraph is backwards. It talks about what you can do with ignorable
whitespace, then defines it in terms of allowable whitespace, then defines
where whitespace is allowed. The opposite order seems like it would make
more sense.
"Comments may also act as 'whitespace' to prevent two adjacent terminals from
being recognized as one."
This suggests that that's the only context in which comments may act as
whitespace, which is not what you want.
Should be mentioned in 2.6?
"foo- foo is a syntax error."
Change "is" to "results in".
"foo-" would be recognized as a QName.
Not necessarily. That is, when the parser raises a syntax error, it doesn't
have to "recognize" anything.
"foo -foo parses the same as foo - foo"
Don't bring parsing into it if you don't have to. Change "parses the same
as" to "is syntactically equivalent to".
"The parser would match..."
These sentences are too implementation-specific.
"also parses the same as"
Ditto previous substitution.
"When used as an operator after the characters of a name, the "-" must be
separated from the name, e.g. by using whitespace or parentheses."
This is odd wording. It's as if you're saying (e.g.):
When your query is
foo-foo
your query must be
foo -foo
or
(foo)-foo
which is self-contradictory. See next point.
"10div 3 results in a syntax error, since the "10" and the "div" would both be
non-delimiting terminals and must be separated by delimiting terminals in order
to be recognized."
This is very odd wording. It's as if the parser must realize that I had
"10" and "div" in mind as distinct terminals, so that it can apply the
terminal-separation rules. The "would be" is a tip-off. Consider this:
"dog" and "cat" 'would be' non-delimiting terminals, but that doesn't mean
that "dogcat" results in a syntax error!
In order to properly apply terminal-separation rules, you need a context in
which (e.g.) "10" and "div" *are* terminals, rather than 'would be'
terminals. And that context is not the query, or the parser, but the
derivation tree (or syntax tree). E.g., it's fine to say something like:
Consider the (abbreviated) syntax tree:
Expr
|
MultiplicativeExpr
|
+--------+--------+
| | |
UnionExpr "div" UnionExpr
| | |
IntegerLiteral | IntegerLiteral
| | |
++ +++ +
|| ||| |
10 div 3
The symbols IntegerLiteral, "div", and IntergerLiteral are all NDTs,
so the adjacent pairs must be separated by whitespace in the resulting
query.
Received on Wednesday, 11 May 2005 07:39:52 UTC