[Prev][Next][Index][Thread]
Re: Concrete syntax, character sets
> All XML documents will be encoded entirely in UTF8, data and markup.
> An XML processor will not perform any conversions on the data or markup, but
> will pass the data and markup to applications as they appear in the document.
UTF-8 is an *encoding*. I cannot agree to fixing the encoding. I can
agree (easily) to fixing the syntax to use ISO 10646.
The model I have in mind is:
author transmission soh parser application
SJIS ---------------------> [SJIS->IR] --------------------------->
where "soh" stand for "Storage Object Handler" and "IR" stands for
"Internal Representation". If the *parser internal* representation is
UTF-8, the so be it, though I myself would probably use 16bit wchar_t.
References: