- From: <bugzilla@wiggum.w3.org>
- Date: Wed, 11 May 2005 07:37:39 +0000
- To: public-qt-comments@w3.org
- Cc:
http://www.w3.org/Bugs/Public/show_bug.cgi?id=1384
Summary: [XQuery] some editorial comments on A.2.1 Terminal Types
Product: XPath / XQuery / XSLT
Version: Last Call drafts
Platform: All
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: XQuery
AssignedTo: chamberl@almaden.ibm.com
ReportedBy: jmdyck@ibiblio.org
QAContact: public-qt-comments@w3.org
A.2.1 Terminal Types
[See a later comment for suggested alternate wording that achieves the effect of
this section without defining 'terminal' or making long lists of symbols.]
(intent)
Is it your intention that the characters of every legal query can be
partitioned into a sequence of terminals and intervening whitespace?
If so, you'll need to add the following as terminals:
Char
"(#" and "#)" (or else Pragma)
PITarget
On the other hand, if it's your intention to only include terminals that
could be next to ignorable whitespace, then there are a bunch that could be
removed:
":"
"""
"'"
"<![CDATA["
"]]>"
PredefinedEntityRef
CharRef
"</"
"{{"
"}}"
EscapeQuot
EscapeApos
S
Note also that some of the terminals derive forms containing other
terminals, which could complicate things:
PredefinedEntityRef -> "lt", "gt", ";"
CharRef -> ";"
Comment -> CommentContents
DecimalLiteral -> Digits, "."
DoubleLiteral -> Digits, "."
StringLiteral -> '"', "'"
QName -> NCName, ":"
'terminal'
This section waffles between two senses of 'terminal': a symbol in the
grammar, or a group of characters in a query. (E.g., "The XQuery grammar
defines 153 terminals." vs. "This query contains 1200 terminals.") Maybe
nobody will mind.
(In my comments, I will abbreviate "delimiting terminal" and "non-delimiting
terminal" as "DT" and "NDT" respectively.)
"delimit"
The rest of the spec uses "delimit" to mean "mark the start and end", e.g.:
-- '(:' and ':)' are comment delimiters,
-- braces delimit an enclosed expression, and
-- a string literal can be delimited by apostrophes or quotation marks.
However, in this section, the intended usage appears to be, for example,
that in
x+1
the plus sign "delimits" x and 1. This is odd. It would be plainer to say
that it "separates" them. (Note that in A.2.2.1, the examples use the
phrase "separated by DTs", not "delimited by DTs".)
So I'd recommend changing "delimit/delimited" to "separate/separated".
However, I don't recommend changing "(non-)delimiting terminal" to
"(non-)separating terminal", as both seem like odd phrasing to me. For
instance, in
x+1-y
it would seem reasonable to say that the 1 separates (or delimits, if you
must) the plus and minus. So to decree that the plus is "separating"
whereas the 1 is "non-separating" doesn't make sense. Instead, I think
"adjoinable" and "non-adjoinable" might be better, or "punctuation-like"
and "word-like", or "closed" and "open", or just "class 1" and "class 2".
----
"A DT may delimit adjacent NDTs."
This is not a definition. The real definition is the list.
(list of DTs)
Delete initial comma.
Delete "%%%".
Maybe change """ to '"'.
Not sure why you need both Comment *and* "(:" + ":)".
Put them in ASCII order? Some kind of order would be nice.
"." and "-" are going to cause problems, given that they're valid
NCNameChars. E.g., if 'x' and '1' are two NDTs, and '-' is the DT that
will 'delimit' them, you get x-1, which doesn't work. (It's misrecognized
as a single NCName.)
Expressing the problem in a different way: A.2.2.1 says that whitespace is
only required between two NDTs, but '-' is not an NDT, so whitespace isn't
required between 'x' and '-'. Which is not what you want.
On the other hand, you can't make "-" an NDT, because then things like
100-x and (blah)-1 would become illegal.
"NDTs generally start and end with alphabetic characters or digits."
This is almost a definition, but the "generally" makes it too vague.
Again, the real definition is the list.
"Adjacent NDTs must be delimited by a DT."
This doesn't belong in a definition.
(list of NDTs)
Change ValidationMode to just "lax", "strict".
Definitely put them in ASCII order.
----
"delimit adjacent"
Both "definitions" have a phrase along the lines of:
"a DT [may/must] delimit adjacent NDTs"
but this makes no sense -- if the DT is between the NDTs, then the NDTs are
not adjacent! Presumably you mean something like
"two NDTs may not be adjacent"
or
"two adjacent terminals may not both be NDTs"
or
"in every pair of adjacent terminals in a query, at least one of the
terminals must be a DT"
However, a reasonable response (to any of these) would be:
But I have no choice! For instance, in the production for VersionDecl,
it says right there:
"xquery" "version" etc.
So the (NDTs) "xquery" and "version" *have* to be adjacent. I can't just
grab some DT and stick it between them -- I'd get a syntax error!
The answer, I assume, would be:
Ah, but S and Comment are DTs, and you certainly *can* (and in fact,
must) put either or both of those between the "xquery" and "version".
This illustrates a couple of problems:
(1) What you mean here by 'adjacent' may not be what the reader thinks.
(2) S and Comment are the only DTs that people can actually use to separate
two NDTs (without changing the structure of the query), and they are
buried in the list, and not mentioned in the prose.
It might help solve both of these problems if you moved/recast the content
of this section into A.2.2 Whitespace Rules. (As far as I can tell, the only
reason to define these classes of terminals is to be able to define where
whitespace is allowed, and where required, so the move is appropriate.)
Received on Wednesday, 11 May 2005 07:37:42 UTC