[Prev][Next][Index][Thread]

Re: Concrete syntax, character sets



> All XML documents will be encoded entirely in UTF8, data and markup.
> An XML processor will not perform any conversions on the data or markup, but 
> will pass the data and markup to applications as they appear in the document.

UTF-8 is an *encoding*. I cannot agree to fixing the encoding. I can
agree (easily) to fixing the syntax to use ISO 10646.

The model I have in mind is:

  author       transmission      soh           parser    application
   SJIS ---------------------> [SJIS->IR] --------------------------->   

where "soh" stand for "Storage Object Handler" and "IR" stands for
"Internal Representation". If the *parser internal* representation is
UTF-8, the so be it, though I myself would probably use 16bit wchar_t.


References: