RE: Latest version of the Infoset

Hi Arnaud,

Thanks for the quick response.

Is there an email feedback I could read regarding the I18N feedback on CDATA, because it sounds pretty bizarre. There's nothing in the XML 1.0 specification to suggest that CDATA sections aren't applicable to all character encodings. For example, if the document is UTF-8 encoded, and a CDATA section appears, I would assume that its internal contents are interpreted as UTF-8 with respect to reading them in. This means that anything in the UCS domain can appear in a CDATA section. The inability to make character references seems to have little to do with the argument being made.

It's well known that CDATA is syntax sugar in the sense that one can always find another way to do it without CDATA, but if I were using an XML authoring tool that is based on infoset, I would not want it to remove my CDATA sections because it would be equivalent to removing CDATA section from XML 1.0 (with respect to my use of that tool).

As for entity reference markers, I was unaware that entities would not be expanded between the start and end marker. As such, the objection raised-- that the trend in W3C is to deal with data models where entities are expanded-- doesn't appear to be applicable.

Finally, could you please provide an email reference to some or all of the XInclude problems that vanish when the entity markers are removed? For most applications, it doesn't seem reasonable that adding information to the data model that is explicitly present in the source document could raise a technical problem since the additional data can simply be ignored when it is not needed.

The fact that entity reference markers cause problems for XInclude would appear to be an indication that XInclude is not an application of XML. Before today, I hadn't heard of XInclude, and I've looked at it for all of two minutes, so I apologize in advance if I have misread something, but it appears that XInclude performs after the parse that which should be done during the parse. Actually, as someone who has written a number of parsers as well as an LR-class parser generator, I'd clarify my statement as follows: inclusion typically leads to all sorts of problems if it is not done by the lexical analyzer that backs the parser because included material has to appear in the parser's token stream to avoid language feature conflicts.

So the question is, should XML InfoSet be prevented from providing information for which we have demonstrated a use when the sole purpose of such prevention is to support some other feature that perhaps should be designed differently?

To me, it seems the answer should be no. The rationale for XInclude (appearing in Section 1 of XInclude) does not substantiate removing the process of inclusion from parse time. All of the complaints seem to be about limitations in XML 1.0. The solution is to issue XML 1.1, not to create this patch.

Thanks,
John Boyer
Senior Product Architect, Software Development
Internet Commerce System (ICS) Team
PureEdge Solutions Inc.
Trusted Digital Relationships
v: 250-708-8047 f: 250-708-8010
1-888-517-2675 http://www.PureEdge.com <http://www.pureedge.com/>

-----Original Message-----
From: Arnaud Le Hors [mailto:lehors@us.ibm.com]
Sent: Wednesday, March 28, 2001 1:48 PM
To: John Boyer
Cc: Philippe Le Hegaret; www-xml-infoset-comments@w3.org;
w3c-ietf-xmldsig@w3.org
Subject: Re: Latest version of the Infoset

John Boyer wrote:
>
> Hi Philippe,
>
> Actually, the status of the document says it is the 'about to become CR'
> working draft that has been created to address the concerns raised in
> last call. There is no indication that another last call will occur
> before infoset becomes a CR.

That's indeed the current plan.

> Thus, it would be helpful to know what aspect of last call feedback
> caused the editors to remove entity ref and cdata markers.

CDATA section markers were removed based primarily on feedback from the
I18N WG, which stated that, because CDATA sections are limited to
certain character sets (because one can't use character references in
them), they cannot be considered as anything else than syntactic sugar.
Therefore they have no place in the infoset. This same argument has been
made by several reviewers and after due consideration the XML Core WG
decided to go along with this.
The removal of entity ref markers also matches the current trend at W3C.
Most WGs work with a data model where entity references are expanded.
Because the XML Infoset is meant to serve as many other WGs as possible,
and because it can be extended by anyone to carry extra information such
as character refs, it seemed only normal to go along with the requests
we received.
In addition, the XML Core WG was facing serious issues in XInclude with
regard to markers. Their removal made all these issues go away.
--
Arnaud Le Hors - Co-chair of XML Core WG