[Bug 2474] [SER] Can fully-normalized be implemented?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=2474





------- Additional Comments From mike@saxonica.com  2005-12-13 13:14 -------
I think this comes down to a question of the definition (or our interpretation
of the definition) of "fully normalized".

We refer to CharMod, which says this:

Text is fully-normalized if... the text is in a Unicode encoding form, is
include-normalized and none of the constructs comprising the text begin with a
composing character or a character escape representing a composing character;

Text is include-normalized if... the text is Unicode-normalized and does not
contain any character escapes or includes whose expansion would cause the text
to become no longer Unicode-normalized; 

The definition of "includes" is: An include is an instance of a syntactic device
specified in a language to include text at the position of the include,
replacing the include itself. Examples of includes are entity references in XML,
@import rules in CSS and the #include preprocessor statement in C/C++.

Colin seems to be assuming that an XInclude element is an "include" in this
sense. We decided that it was not. XInclude operates at a higher level of the
stack than we do: from our perspective it is an application-level construct, not
a "syntactic device". It's no different from <xsl:include> or <xsd:include>: we
can't be expected to understand the semantics of every element in every XML
vocabulary.

Entity references are "includes" in this sense, but the serializer never
generates them, so they don't affect the outcome.

I think the only difference between "fully-normalized" and NFC, as far as our
serialization spec is concerned, is that with "fully-normalized" the output
cannot start with a composing character or a character escape representing a
composing character.

Michael Kay

Received on Tuesday, 13 December 2005 13:15:16 UTC