- From: John Cowan <cowan@ccil.org>
- Date: Tue, 3 Jan 2006 12:30:38 -0500
- To: Norman Walsh <Norman.Walsh@Sun.COM>
- Cc: public-xml-core-wg@w3.org
Norman Walsh scripsit: > Suppose that an author writes the tag name <Montréal> in his document, > composing the accented "é" with the two code points, an unaccented "e" > and a combining accent. > > Now suppose that he writes "</Montréal>" using the pre-combined single > code point "é". > > Is that document well-formed? > > Suppose that he writes the start and end tags using the two code > points version, but his DTD uses the single code-point version. Is the > document valid? (Assuming it would be valid except for the suggested > possible difference.) The answer is no to both questions, and for the same reason: the names do not match, according to the definition of "match" from Section 1.2: Two strings or names being compared MUST be identical. Characters with multiple possible representations in ISO/IEC 10646 (e.g. characters with both precomposed and base+diacritic forms) match only if they have the same representation in both strings. No case folding is performed. (To nail it down, the WFC "Element Type Match" in Section 3 requires that names in start-tags and end-tags match, and the VC "Element Valid" immediately following requires that the name in the declaration match the element type.) -- Time alone is real John Cowan <cowan@ccil.org> the rest imaginary http://www.reutershealth.com like a quaternion --phma http://www.ccil.org/~cowan
Received on Tuesday, 3 January 2006 17:30:50 UTC