[Prev][Next][Index][Thread]

Re: Concrete syntax, character sets



[Gavin:]

> some people confuse coded character sets and encodings

Yeah.  Me, for instance.  I apologize for getting this mixed up (once
again) in an earlier comment.

Let's review the basics for us slower members of the class.

* The character set allowed in markup or data (e.g., 10646 BMP) and
the character encoding put on the wire or in a disk file (e.g., UTF-8)
are completely different issues.

* We can specify 10646 as the character set (in the SGML sense) for
all XML documents while still allowing a much more limited encoding
(such as 7-bit ASCII) for the transfer of a particular document
instance, assuming that information about which encoding is being used
is conveyed when the transmission is established.

* Notwithstanding the last point, we can (if we want) go farther than
saying that 10646 is the XML character set (in the SGML sense) and
also say that UTF-8 is the only (or recommended, or expected, or
default) encoding that we shall/should/expect to find when we open a
file containing an XML document or receive a byte stream during an
HTTP session.

Right?

Jon


References: