- From: Tim Bray <tbray@textuality.com>
- Date: Sun, 15 Sep 1996 23:36:27 +0000
- To: w3c-sgml-wg@w3.org
At 12:22 PM 9/15/96 CDT, Michael Sperberg-McQueen wrote: [after an *excellent* summary-of-the-position] >Here's yet another proposal. > >6. Limited Modfied Eclecticism: compromise between Eclectic >Compromise and 100 Flowers: > - XML data streams may be in any of a number of supported encodings: > UTF-8, UTF-16, UCS-4, ISO 8859 > - XML data streams must label themselves as to which supported > encoding they are using, by means of a PI which must be the first > data in each XML entity. > - all XML systems must accept XML data in any supported encoding, > detecting the encoding in use from the internal label; > they may reject data in other encodings. > (See note on autodetection, below.) ... other good stuff Your point about "if it just reads ASCII, it's not really XML" is well-taken; but setting the bar at a point which includes 8859 *and* UTF *and* UCS for basic acceptance is I think serious infringement on our design goal #4 that says XML shall be easy to program. Also, including 8859 but not JIS is disturbingly Eurocentric. Are there grounds for compromise between minimalism and eclecticism by saying that (a) here are a list of encodings which should be supported, (b) entities have to self-label with leading PI's, and (c) all XML implementations *must* be able to read UTF8 as well as generate it? Second, might it be clever, for UTF8-encoded entities, to relax the requirement that all XML entities self-label ? Not that UTF8 is morally superior or anything, but this would have the desirable side-effect of turning a large proportion of the SGML objects in the world into XML. Cheers, Tim Bray tbray@textuality.com http://www.textuality.com/ +1-604-488-1167
Received on Monday, 16 September 1996 02:38:08 UTC