[Bug 28024] Does "[Unicode] characters" EQUAL "Char production in [XML]?

https://www.w3.org/Bugs/Public/show_bug.cgi?id=28024

C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |cmsmcq@blackmesatech.com

--- Comment #10 from C. M. Sperberg-McQueen <cmsmcq@blackmesatech.com> ---
[Speaking for myself and not on behalf of anyone else]

Having read this bug report and the ensuing discussion more than once, I remain
puzzled.  The bug report starts by talking about "conflicting definitions", but
I don't see any conflicts identified.

The set of characters identified by the Char production in the XML spec is a
subset of the set of Unicode characters (or it was the last time I looked).  So
it is necessarily true that any sequence of instances of the Char production in
[XML] is (by definition) a sequence of Unicode characters.  The sentence quoted
from XPath 3.1, section 2 Basics, is identifying a property of XPath 3.1
expressions which some people find important:  they are strings of characters. 

  The basic building block of XPath 3.1 is the expression, which is a 
  string of [Unicode] characters; the version of Unicode to be used is 
  implementation-defined.

It is not a definition of "string" or "expression" or "character", and it does
not say that the set of expressions is the same as the set of strings of
Unicode characters, only that every expression is a string of Unicode
characters.  There are plenty of strings of Unicode expressions which are not
XPath expressions -- strings including Unicode characters that don't match the
XML Char production among them.  Strings beginning with right-parenthesis or
ending with left-parenthesis are also in that set.  The sentence quoted says
nothing that contradicts any of these straightforward points.

It is quite true that if a reader assumes that any bolded string in the spec
marks a definition, and infers from the bolding of the word 'expression' that
this is intended as a definition, then the reader is apt to find the text more
than a bit confusing.  The problem in this case, however, is that the
assumption does not hold.

If I knew a good way to make readers stop assuming things that aren't true, I'd
be a happier man.

The specs might indeed be easier to read, in some ways, and for some people, if
they had numbered definitions, or an alphabetical list of definitions
sequestered in a terminology section.  But there is ample anecdotal evidence
that many people find that format off-putting and perhaps a bit confusing; the
preference of many W3C working groups for embedding definitions in the
exposition instead of sequestering them seems to suggest that some of those
people inhabit W3C WGs.  (And I would urge caution before replacing occurrences
of terms with their definitions -- that will only work when the definiens has
the same part of speech as the definiendum, and that is another kind of
formality that apparently strikes many people involved in spec development as
artificial and awkward, and not to be enforced.)

-- 
You are receiving this mail because:
You are the QA Contact for the bug.

Received on Tuesday, 3 March 2015 19:28:59 UTC