Re: Fwd: Question about normalization checking in XML 1.1 from Paul Grosso on 2014-12-29 (public-xml-core-wg@w3.org from December 2014)

From: Paul Grosso <paul@paulgrosso.name>
Date: Mon, 29 Dec 2014 10:42:19 -0600
To: public-xml-core-wg@w3.org
Message-ID: <54A1846B.8070706@paulgrosso.name>
In trying to research this, I added some references and comments below.

On 2014-12-23 19:41, John Cowan wrote:
> I received this email as an editor of XML 1.1.  However, I think the
> response should come from the WG.
>
> ----- Forwarded message from Alexey Neyman <stilor@att.net> -----
>
> Date: Fri, 19 Dec 2014 23:49:33 -0800
> From: Alexey Neyman <stilor@att.net>
> To: tbray@textuality.com, jeanpa@microsoft.com, cmsmcq@w3.org,
> 	elm@east.sun.com, cowan@ccil.org
> Subject: Question about normalization checking in XML 1.1
>
> Hi,
>
> I am trying to understand which portions of a document conforming to
> XML 1.1 are expected to be normalized. In the document entity, the

At http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-document
is the following production defining "document":

[1] document ::= ( prolog element Misc* ) - ( Char* RestrictedChar Char* )


> specification [1] prescribes that the text matching the following
> productions should be normalized: CData, CharData, content, Name,
> Nmtoken.

Normalization checking--including the above statement--is discussed at
http://www.w3.org/TR/2006/REC-xml11-20060816/#sec-normalization-checking

The commentor's mention of "parent elements" below is pointing
out that--while most elements' attribute specifications' AttValue
(production [10]) end up being parsed as a result of parsing
"content" (via content [43] -> element [39] -> STag [40] ->
Attribute [41] -> AttValue [10])--the document element's attribute
specifications cannot be reached via "content" or any of the other
productions mentioned in this Normalization checking statement.


>
> This seems somewhat inconsistent: as far as I understand, it would
> mean that the attribute values for non-root elements should be
> normalized (because they match the 'content' production for their
> respective parent element), but the attributes for the root element
> may not be normalized (because the root element does not have a parent
> element).

At http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-content
is the following production for "content":

[43] content ::= CharData? ((element | Reference | CDSect | PI | 
Comment) CharData?)*


>
> Likewise, it seems to require that the whole content of the PIs inside
> the root element is to be normalized (both the target and the
> pseudo-attributes - because, again, the whole PI is a part of the
> 'content' production of the enclosing element) - but for PIs at the
> top-level (i.e. those that are part of the 'Misc' production, or those
> inside a document type declaration), only the PI target is expected to
> be normalized (since the target matches the 'Name' production and the
> rest of the PI content is expressed via 'Char' production.

At http://www.w3.org/TR/2006/REC-xml11-20060816/#NT-Misc
is the following production for "Misc":

[27] Misc ::= Comment | PI | S


Again, while PI's within any element get reached as a result
of the "content" production, PI's within Misc are not reached
via any of the productions mentioned in the Normalization checking
statement.


>
> Am I missing anything? If not, could you please explain the rationale
> for this apparent inconsistency?

Unless I am also missing something, it does look like an inconsistency
to me, and I suspect the inconsistency is accidental.

What do others think (1) is the case and (2) should be the case?

paul



>
> Thanks,
> Alexey.
>
> [1] http://www.w3.org/TR/2006/REC-xml11-20060816/
>
> ----- End forwarded message -----
>
Received on Monday, 29 December 2014 16:42:52 UTC