- From: Gavin Nicol <gtn@eps.inso.com>
- Date: Thu, 12 Jun 1997 08:46:33 -0400
- To: w3c-sgml-wg@w3.org
James writes: >We shouldn't be mandating a particular character encoding for an API. An >application should be able to use UTF-8 or UTF-16 or UCS-4, as it sees fit. Right. This has nothing do do with a language spec. as I said earlier. >> I have no sympathy >>for the ISO claim that the 31-bit version is more fixed-width in >>any meaningful sense, since Unicode is full of combining characters >>anyhow. > >Not all scripts have combining characters. If I am working with a script >that doesn't have combining characters and does use a lot of characters >outside the BMP (Chinese for example), then it would make sense to use >internally a 32-bit fixed width encoding. Right. The other important thing is that if the range of encodings accepted is limited to those used in a local sense, one should be able to build a processor/parser optimized for that particular case. >The place where this needs to be addressed is when you do a binding of the >DOM to a particular programming language. At that point you need to say how >the data types provided by that programming language are used to represent >the abstract character data type. Right. I suggested to Tim earlier, in the context of the DOM, that the DOM shopuld only specify that characters are stored in a "string". The actual implementation of a string should be decided by the application architecture (i.e. a JAVA application would use String, a C application would use wchar_t* etc.) I think we have a few issues with the declaration, but other than that, the only issue we have is deciding which coded character set to use in the SGML declaration for XML. As I said before, I think ISO 10646 is better for this. One again, encodings are *not* an issue for XML-lang.
Received on Thursday, 12 June 1997 09:03:38 UTC