Re: New work item for XML group ? (Re: Comments on 31 March spec)

Murata Makoto wrote:
> Rick Jelliffe writes:
> >ISO 10646 BMP is enough for XML characters.
> I disagree.  XML should allow every character in ISO 10646.  I believe
> that this is the intension of the current draft.   (See the syntax
> of character references [58] and [59].)

I think I should have said "Unicode 2.0 characters are enough for XML
characters; other characters should be handled as a glyph references
at the present time." 

Note that the glyph reference could include a collation key (e.g. a 
character with a similar meaning, or the pronunciation of the character)
which would promote the glyph reference to being more like a character.

In the interests of interoperability, I think ERB needs to 
standardise on a character width for XML 1.0; at the <b>current state of
the art</b>, this should be 16-bits.  Java, for example, only uses
characters currently, and implemtation of C's wchar_t store it as
Maybe software people should say whether in fact this is a valid reason.

An alternative is that the Encoding PI also includes a pseudo-attribute
to describe the character width required for the document. (The presence
of the "ISO 8879:1986 (ENR)" in the SGML declaration for XML already
implies that 8-bit-character systems will not cope with all XML

I guess another alternative (Yuck) is that a recieving 16-bit system
substitute the ISO 10646 unknown-character character if it receives
a > 2^16 character.  This is what would happen if we don't specify 
a target character width, or an appropriate Encoding PI

We don't want to shut the door on the thief (unknown encodings), only 
to let him in through the window (unkown maximum character widths). 

You mentioned that JIS may be proposing these extra characters to
ISO 10646 in three years or so; which could translate to four years
or so after discussion and voting.  That seems a good timeframe for
XML 2.0, but I think XML 1.0 should limit itself to providing well for
present needs, using present art.  

What is the timeframe for the JIS extensions to be added to shift-JIS
or EUC and be supported by OS vendors and applications, do you know?  
If the JIS extensions will be definitely be in use in 1997 or 1998, 
then certainly it is a matter for XML 1.0.  And I guess this comes down 
to "What has Microsoft said?"

-Rick Jelliffe

Follow-Ups: References: