W3C home > Mailing lists > Public > xml-editor@w3.org > April to June 2000

SP,% before encoding declaration (was: RE: I18N issues with the XML Specification)

From: Martin J. Duerst <duerst@w3.org>
Date: Wed, 12 Apr 2000 17:14:46 +0900
Message-Id: <4.2.0.58.J.20000412164407.03376f00@sh.w3.mag.keio.ac.jp>
To: "Fran苡is Yergeau" <yergeau@alis.com>, "'Misha Wolf'" <misha.wolf@reuters.com>, <w3c-i18n-ig@w3.org>
Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
I'm not sure I agree. I have read Makoto's mail, and his
analysis is very thorough, and I'm not questioning it here.

However, for UTF-16 and anything similar to it, and for
any kind of entity, either of the following is true:

- It has some external encoding info. There is no need
   for heuristics.
- It is UTF-16. In this case, it has a BOM.
- It has an encoding declaration.

Makoto clearly shows that it's possible to have white space
and some other stuff at the start of external subsets,...,
BUT that is only the case if there is not TextDecl or XMLDecl.
So whatever has an encoding declaration has it first, without
any kind of other stuff before it (except a BOM).

This is easy to see from the following rules:

[22]  prolog ::=  XMLDecl? Misc* (doctypedecl Misc*)?
[30] extSubset ::=  TextDecl? extSubsetDecl
[79]  extPE ::=  TextDecl? extSubsetDecl
[78]  extParsedEnt ::=  TextDecl? content


I therefore propose that the various white-space and %
case, as well as the first sentence of the last paragraph in
E44, be removed. I have reflected that at
http://www.w3.org/International/Group/issues/xml/Overview.html#charset.autod 
etection


Any comments?

Regards,    Martin.

At 00/04/03 20:03 -0400, Fran苡is Yergeau wrote:
>Misha wrote:
> > The result of our discussions is recorded in:
> >
> >    I18N issues with the XML Specification
> >    http://www.w3.org/International/Group/issues/xml
>
>I have reviewed E44 [1], which is mentionned as the first issue in the "Deal
>with later" section of our issues list.
>
>I traced back the original mail from Murata Makoto [2] from which this
>erratum was written up.  I reviewed this mail again and it seems fine to me.
>The fact that we did not understand the erratum in Amsterdam was probably
>due to our rather hasty process, faced as we were with too much to do in too
>little time.
>
>I propose that we drop this erratum from our issues list.
>
>[1] http://www.w3.org/XML/xml-19980210-errata#E44
>[2] http://lists.w3.org/Archives/Member/w3c-xml-syntax-wg/1999Feb/0124.html
>
>--
>Fran輟is
Received on Wednesday, 12 April 2000 04:12:46 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:59:30 GMT