[Prev][Next][Index][Thread]
Re: Concrete syntax, character sets
>* The character set allowed in markup or data (e.g., 10646 BMP) and
>the character encoding put on the wire or in a disk file (e.g., UTF-8)
>are completely different issues.
>
>* We can specify 10646 as the character set (in the SGML sense) for
>all XML documents while still allowing a much more limited encoding
>(such as 7-bit ASCII) for the transfer of a particular document
>instance, assuming that information about which encoding is being used
>is conveyed when the transmission is established.
Correct, and I would further add that application that restrict
themselves to 7 bit encodings do not need to use 32 bits internally
(though some might disagree with me).
>* Notwithstanding the last point, we can (if we want) go farther than
>saying that 10646 is the XML character set (in the SGML sense) and
>also say that UTF-8 is the only (or recommended, or expected, or
>default) encoding that we shall/should/expect to find when we open a
>file containing an XML document or receive a byte stream during an
>HTTP session.
Right. I have no problem with saying that UTF8 be the *default*, or
*preferred* encoding. I cannot agree to UTF8 as the *only* encoding.
References: