- From: Keld Jørn Simonsen <keld@dkuug.dk>
- Date: Sun, 20 May 2001 19:15:03 +0200
- To: Asmus Freytag <asmusf@ix.netcom.com>
- Cc: Keld Jørn Simonsen <keld@dkuug.dk>, Misha Wolf <Misha.Wolf@reuters.com>, Mark Davis <mark@macchiato.com>, ietf-charsets@iana.org, w3c-i18n-ig@w3.org
On Sat, May 19, 2001 at 06:28:35PM -0700, Asmus Freytag wrote: > At 06:48 PM 5/14/01 +0200, Keld Jørn Simonsen wrote: > >On Sun, May 13, 2001 at 11:34:24AM -0700, Asmus Freytag wrote: > > > At 08:05 PM 5/11/01 +0100, Misha Wolf wrote: > > > > > > > > > >Has anyone looked to see how this ties in with: > > > > Extensible Markup Language (XML) 1.0 (Second Edition) > > > > Autodetection of Character Encodings (Non-Normative) > > > > http://www.w3.org/TR/REC-xml#sec-guessing > > > > > > > >Misha > > > > > > > > > > I took a quick look. The section already talks about 4-byte codes. > > > Replace UCS-4 by UTF-32 in that section and it would seem to cover it. > > > >I think that the w3c specs should rather refer the 10646 specs, > >and thus keep the reference to UCS-4. > > We deliberately introduced the term UTF-32 since this regularizes > the notation for everyone. Little is to be gained by using a mixed > notation in that section using UTF-8, UTF-16 and UCS-4 together. > > If you would like to fix the 10646 spec, you could propose that the > term UTF-32 is formally added there as well. As it stands, 10646 > has an unfortunate asymmetry in notation that is cumbersome to use > for the non-specialist. You really should not do this. UCS-4 is the canonical representation of 10646. UTF-32 would be misleading, as the UCS-4 is not a transformation format, but the "real thing". Kind regards Keld
Received on Friday, 25 May 2001 20:24:50 UTC