- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Sat, 12 Apr 1997 18:49:48 +1000
- To: Murata Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- CC: w3c-sgml-wg@w3.org
Murata Makoto wrote: > Rick Jelliffe writes: > >ISO 10646 BMP is enough for XML characters. > > I disagree. XML should allow every character in ISO 10646. I believe > that this is the intension of the current draft. (See the syntax > of character references [58] and [59].) I think I should have said "Unicode 2.0 characters are enough for XML 1.0 characters; other characters should be handled as a glyph references at the present time." Note that the glyph reference could include a collation key (e.g. a character with a similar meaning, or the pronunciation of the character) which would promote the glyph reference to being more like a character. In the interests of interoperability, I think ERB needs to standardise on a character width for XML 1.0; at the <b>current state of the art</b>, this should be 16-bits. Java, for example, only uses 16-bit characters currently, and implemtation of C's wchar_t store it as 16-bit. Maybe software people should say whether in fact this is a valid reason. An alternative is that the Encoding PI also includes a pseudo-attribute to describe the character width required for the document. (The presence of the "ISO 8879:1986 (ENR)" in the SGML declaration for XML already implies that 8-bit-character systems will not cope with all XML documents.) I guess another alternative (Yuck) is that a recieving 16-bit system could substitute the ISO 10646 unknown-character character if it receives a > 2^16 character. This is what would happen if we don't specify a target character width, or an appropriate Encoding PI pseudo-attribute. We don't want to shut the door on the thief (unknown encodings), only to let him in through the window (unkown maximum character widths). You mentioned that JIS may be proposing these extra characters to ISO 10646 in three years or so; which could translate to four years or so after discussion and voting. That seems a good timeframe for XML 2.0, but I think XML 1.0 should limit itself to providing well for present needs, using present art. What is the timeframe for the JIS extensions to be added to shift-JIS or EUC and be supported by OS vendors and applications, do you know? If the JIS extensions will be definitely be in use in 1997 or 1998, then certainly it is a matter for XML 1.0. And I guess this comes down to "What has Microsoft said?" -Rick Jelliffe
Received on Saturday, 12 April 1997 04:44:50 UTC