Re: Syntactic variation in QNames from John Cowan on 2006-01-03 (public-xml-core-wg@w3.org from January 2006)

From: John Cowan <cowan@ccil.org>
Date: Tue, 3 Jan 2006 12:30:38 -0500
To: Norman Walsh <Norman.Walsh@Sun.COM>
Cc: public-xml-core-wg@w3.org
Message-ID: <20060103173034.GE26883@ccil.org>

Norman Walsh scripsit:

> Suppose that an author writes the tag name <Montréal> in his document,
> composing the accented "é" with the two code points, an unaccented "e"
> and a combining accent.
> 
> Now suppose that he writes "</Montréal>" using the pre-combined single
> code point "é".
> 
> Is that document well-formed?
> 
> Suppose that he writes the start and end tags using the two code
> points version, but his DTD uses the single code-point version. Is the
> document valid? (Assuming it would be valid except for the suggested
> possible difference.)

The answer is no to both questions, and for the same reason: the names
do not match, according to the definition of "match" from Section 1.2:

	Two strings or names being compared MUST be identical. Characters
	with multiple possible representations in ISO/IEC 10646
	(e.g. characters with both precomposed and base+diacritic
	forms) match only if they have the same representation in both
	strings. No case folding is performed.

(To nail it down, the WFC "Element Type Match" in Section 3 requires
that names in start-tags and end-tags match, and the VC "Element Valid"
immediately following requires that the name in the declaration match
the element type.)

-- 
Time alone is real                      John Cowan <cowan@ccil.org>
  the rest imaginary                    http://www.reutershealth.com
like a quaternion       --phma          http://www.ccil.org/~cowan

Received on Tuesday, 3 January 2006 17:30:50 UTC