- From: Jon Bosak <bosak@atlantic-83.Eng.Sun.COM>
- Date: Tue, 10 Sep 1996 22:32:04 -0700
- To: gtn@ebt.com
- CC: tbray@textuality.com, w3c-sgml-wg@w3.org
[Gavin:] > some people confuse coded character sets and encodings Yeah. Me, for instance. I apologize for getting this mixed up (once again) in an earlier comment. Let's review the basics for us slower members of the class. * The character set allowed in markup or data (e.g., 10646 BMP) and the character encoding put on the wire or in a disk file (e.g., UTF-8) are completely different issues. * We can specify 10646 as the character set (in the SGML sense) for all XML documents while still allowing a much more limited encoding (such as 7-bit ASCII) for the transfer of a particular document instance, assuming that information about which encoding is being used is conveyed when the transmission is established. * Notwithstanding the last point, we can (if we want) go farther than saying that 10646 is the XML character set (in the SGML sense) and also say that UTF-8 is the only (or recommended, or expected, or default) encoding that we shall/should/expect to find when we open a file containing an XML document or receive a byte stream during an HTTP session. Right? Jon
Received on Wednesday, 11 September 1996 01:34:21 UTC