- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Wed, 28 May 1997 09:54:03 -0400
- To: murata@apsdc.ksp.fujixerox.co.jp
- CC: w3c-sgml-wg@w3.org
>Comments on XML Part 1 from Japanese experts ... >2. Hankaku katakana characters > >(1) Proposal > >Hankaku katakana characters should be used through character >references only. ... >Internet experts in Japan commonly have the belief that >hankaku katakana characters shall not be used. The only >encoding method for transmitting Japanese e-mail and news >articles, namely ISO_2022_JP, does not allow hankaku katakana >characters. Thus, XML documents containing hankaku katakana >characters will lead to loss of information or data >corruption, when they are transmitted via e-mail or news. I do not mind losing hankaku katakana characters, but am slightly worried about MSDOS and MS-WIndows based programs that might generate XML. For example, if MS-Word had the ability to create XML, then we might run into some problems. I know that hankaku are becoming less frequently used, so I am not sure that forbidding them makes sense... over time they will not be used anyway. >3. JIS X 0212 characters in EUC_JP > >(1) Proposal > >Direct use of JIS X 0212 characters in EUC-JP should be >prohibited. In other words, the control function SS3 of >EUC_JP should be disallowed in XML. For this, like the above, I can understand the rationale, but I do not think that the XML language specification should mention anything about encodings, beyond the fact that any encoding whose character repertoire is a subset of that in the SGML declaration can be used. >(3) Background > >JIS X 0212 is rarely used in Japan. First, JIS X 0212 >characters cannot be represented by Shift_JIS, and are thus >not supported by most of the personal computers. Second, >JIS X 0212 characters cannot be represented by ISO_2022_JP, >and thus cannot be transmitted via e-mail or news. Third, a >control function SS3 is required to capture JIS X 0212 >characters in EUC_JP, but many tools do not support this >control function. The fact that most tools do not support JIS X 212 is probably sufficient to make it seldom used. I do not think we need explicit language for this. >3. Shift_JIS > >(1) Proposal > >As the definition of Shift_JIS, Appendix 1 of JIS X 0208-1997 >should be referenced. Again, I do not think the XML specification should even attempt to enumerate the possible encodings (with the possible exception of the Unicode ecodings). >4. Ideographic space character > >(1) Proposal > >The ideographic space character should not be considered as >a white space character. > >(2) Proposed revision > >Revise [1] as below: > [1] S::= (#X0020 | #X0009 | #X000d | #X000a)+ > >Remove the below line from the SGML declaration. > > ITAB SEPCHAR 12288 -- ideographic space -- > >(3) Background > >In Japan, this character is recognized as a fixed-width >"Kanji" character. There is no consensus that the ideographic >space character is a delimiter. Rather, most people assume >that it is not. I disagree. I have seen *many* Japanese people become *very* confused when they include (accidently or not) an ideographics space character into their DTD or whatever, and have a parser fail. Also, I do not think that most people recognise it as a fixed-wdith kanji character at all. A long time ago now, I spent a considerable amount of time talking to SGML *users* about whether they would like hankaku katakana, and ideographic spaces "normalised", and how they would use them. The results of that survey were inconclusive at best... some people thought normalisation was a great idea, and other felt that they really didn't want it. >5. The private use areas > >(1) Proposal > >The private use areas of Unicode should not be used in XML. The private use area, and surrogates are critical for certain applications. >(3) Background > >XML is primarily intended for open interchange over the >Internet. If the private use areas of Unicode are >used in XML, we will have incompatibility problems. This is based upon the denial of the need. The need is there, but the infrastructure is not.
Received on Wednesday, 28 May 1997 09:54:53 UTC