Re: I18N issue needs consideration from Gavin Nicol on 1997-06-12 (w3c-sgml-wg@w3.org from June 1997)

From: Gavin Nicol <gtn@eps.inso.com>
Date: Thu, 12 Jun 1997 08:46:33 -0400
To: w3c-sgml-wg@w3.org
Message-Id: <199706121246.IAA23789@nathaniel.eps.inso.com>

James writes:

>We shouldn't be mandating a particular character encoding for an API.  An
>application should be able to use UTF-8 or UTF-16 or UCS-4, as it sees fit.

Right. This has nothing do do with a language spec. as I said earlier.

>> I have no sympathy
>>for the ISO claim that the 31-bit version is more fixed-width in
>>any meaningful sense, since Unicode is full of combining characters
>>anyhow.
>
>Not all scripts have combining characters.   If I am working with a script
>that doesn't have combining characters and does use a lot of characters
>outside the BMP (Chinese for example), then it would make sense to use
>internally a 32-bit fixed width encoding.

Right. The other important thing is that if the range of encodings
accepted is limited to those used in a local sense, one should be able
to build a processor/parser optimized for that particular case.

>The place where this needs to be addressed is when you do a binding of the
>DOM to a particular programming language.  At that point you need to say how
>the data types provided by that programming language are used to represent
>the abstract character data type.

Right. I suggested to Tim earlier, in the context of the DOM, that the
DOM shopuld only specify that characters are stored in a "string". The
actual implementation of a string should be decided by the application
architecture (i.e. a JAVA application would use String, a C
application would use wchar_t* etc.)

I think we have a few issues with the declaration, but other than
that, the only issue we have is deciding which coded character set to
use in the SGML declaration for XML. As I said before, I think ISO
10646 is better for this.

One again, encodings are *not* an issue for XML-lang.

Received on Thursday, 12 June 1997 09:03:38 UTC