- From: Martin J. Duerst <duerst@w3.org>
- Date: Wed, 12 Apr 2000 17:14:46 +0900
- To: "Fran苡is Yergeau" <yergeau@alis.com>, "'Misha Wolf'" <misha.wolf@reuters.com>, <w3c-i18n-ig@w3.org>
- Cc: xml-editor@w3.org, w3c-xml-core-wg@w3.org
I'm not sure I agree. I have read Makoto's mail, and his analysis is very thorough, and I'm not questioning it here. However, for UTF-16 and anything similar to it, and for any kind of entity, either of the following is true: - It has some external encoding info. There is no need for heuristics. - It is UTF-16. In this case, it has a BOM. - It has an encoding declaration. Makoto clearly shows that it's possible to have white space and some other stuff at the start of external subsets,..., BUT that is only the case if there is not TextDecl or XMLDecl. So whatever has an encoding declaration has it first, without any kind of other stuff before it (except a BOM). This is easy to see from the following rules: [22] prolog ::= XMLDecl? Misc* (doctypedecl Misc*)? [30] extSubset ::= TextDecl? extSubsetDecl [79] extPE ::= TextDecl? extSubsetDecl [78] extParsedEnt ::= TextDecl? content I therefore propose that the various white-space and % case, as well as the first sentence of the last paragraph in E44, be removed. I have reflected that at http://www.w3.org/International/Group/issues/xml/Overview.html#charset.autod etection Any comments? Regards, Martin. At 00/04/03 20:03 -0400, Fran苡is Yergeau wrote: >Misha wrote: > > The result of our discussions is recorded in: > > > > I18N issues with the XML Specification > > http://www.w3.org/International/Group/issues/xml > >I have reviewed E44 [1], which is mentionned as the first issue in the "Deal >with later" section of our issues list. > >I traced back the original mail from Murata Makoto [2] from which this >erratum was written up. I reviewed this mail again and it seems fine to me. >The fact that we did not understand the erratum in Amsterdam was probably >due to our rather hasty process, faced as we were with too much to do in too >little time. > >I propose that we drop this erratum from our issues list. > >[1] http://www.w3.org/XML/xml-19980210-errata#E44 >[2] http://lists.w3.org/Archives/Member/w3c-xml-syntax-wg/1999Feb/0124.html > >-- >Fran輟is
Received on Wednesday, 12 April 2000 04:12:46 UTC