W3C home > Mailing lists > Public > w3c-sgml-wg@w3.org > April 1997

Re: New work item for XML group ? (Re: Comments on 31 March spec)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Sat, 12 Apr 1997 18:49:48 +1000
Message-ID: <334F4CAB.969@allette.com.au>
To: Murata Makoto <murata@apsdc.ksp.fujixerox.co.jp>
CC: w3c-sgml-wg@w3.org
Murata Makoto wrote:
> Rick Jelliffe writes:
> >ISO 10646 BMP is enough for XML characters.
> 
> I disagree.  XML should allow every character in ISO 10646.  I believe
> that this is the intension of the current draft.   (See the syntax
> of character references [58] and [59].)

I think I should have said "Unicode 2.0 characters are enough for XML
1.0
characters; other characters should be handled as a glyph references
at the present time." 

Note that the glyph reference could include a collation key (e.g. a 
character with a similar meaning, or the pronunciation of the character)
which would promote the glyph reference to being more like a character.

In the interests of interoperability, I think ERB needs to 
standardise on a character width for XML 1.0; at the <b>current state of
the art</b>, this should be 16-bits.  Java, for example, only uses
16-bit
characters currently, and implemtation of C's wchar_t store it as
16-bit.  
Maybe software people should say whether in fact this is a valid reason.

An alternative is that the Encoding PI also includes a pseudo-attribute
to describe the character width required for the document. (The presence
of the "ISO 8879:1986 (ENR)" in the SGML declaration for XML already
implies that 8-bit-character systems will not cope with all XML
documents.) 

I guess another alternative (Yuck) is that a recieving 16-bit system
could 
substitute the ISO 10646 unknown-character character if it receives
a > 2^16 character.  This is what would happen if we don't specify 
a target character width, or an appropriate Encoding PI
pseudo-attribute.

We don't want to shut the door on the thief (unknown encodings), only 
to let him in through the window (unkown maximum character widths). 

You mentioned that JIS may be proposing these extra characters to
ISO 10646 in three years or so; which could translate to four years
or so after discussion and voting.  That seems a good timeframe for
XML 2.0, but I think XML 1.0 should limit itself to providing well for
present needs, using present art.  

What is the timeframe for the JIS extensions to be added to shift-JIS
or EUC and be supported by OS vendors and applications, do you know?  
If the JIS extensions will be definitely be in use in 1997 or 1998, 
then certainly it is a matter for XML 1.0.  And I guess this comes down 
to "What has Microsoft said?"

-Rick Jelliffe
Received on Saturday, 12 April 1997 04:44:50 EDT

This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:04:24 EDT