RE: Latest version of the Infoset from Martin Duerst on 2001-03-29 (w3c-ietf-xmldsig@w3.org from January to March 2001)

From: Martin Duerst <duerst@w3.org>
Date: Thu, 29 Mar 2001 09:29:30 +0900
To: "John Boyer" <JBoyer@PureEdge.com>, "Arnaud Le Hors" <lehors@us.ibm.com>
Cc: "Philippe Le Hegaret" <plh@w3.org>, <www-xml-infoset-comments@w3.org>, <w3c-ietf-xmldsig@w3.org>
Message-Id: <4.2.0.58.J.20010329091723.00c70450@sh.w3.mag.keio.ac.jp>

Hello John,

At 14:30 01/03/28 -0800, John Boyer wrote:

>Hi Arnaud,
>
>Thanks for the quick response.
>
>Is there an email feedback I could read regarding the I18N feedback on 
>CDATA, because it sounds pretty bizarre.  There's nothing in the XML 1.0 
>specification to suggest that CDATA sections aren't applicable to all 
>character encodings.  For example, if the document is UTF-8 encoded, and a 
>CDATA section appears, I would assume that its internal contents are 
>interpreted as UTF-8 with respect to reading them in.  This means that 
>anything in the UCS domain can appear in a CDATA section.  The inability 
>to make character references seems to have little to do with the argument 
>being made.

The problem is not with UTF-8, but with restricted encodings, such as
Latin-1. You cannot convert a document in UTF-8 to a document in Latin-1
and guarantee to keep your CDATA sections.

>It's well known that CDATA is syntax sugar in the sense that one can 
>always find another way to do it without CDATA, but if I were using an XML 
>authoring tool that is based on infoset, I would not want it to remove my 
>CDATA sections because it would be equivalent to removing CDATA section 
>from XML 1.0 (with respect to my use of that tool).

Please note that you most probably wouldn't want your XML authoring tool
to mess with quite some other things, including whether you use single
or double quotes for attributes, and so on.

The Infoset is not a collection of things that it would make sense to
preserve by an authoring tool. It is very clear that a good authoring
tool will preserve more than the Infoset. The Infoset is about what
is really essential information in the XML document. CDATA is not
part of that.

>As for entity reference markers, I was unaware that entities would not be 
>expanded between the start and end marker.

I'm confused. I thought the infoset expanded them. If not,
having both a start and an end marker wouldn't make sense
at all.

Regards,   Martin.

Received on Wednesday, 28 March 2001 19:31:00 UTC