Reads like ASCII (was Re: character sets ...)
On Sun, 15 Sep 1996, Tim Bray wrote:
> Your point about "if it just reads ASCII, it's not really XML" is well-taken;
> but setting the bar at a point which includes 8859 *and* UTF *and* UCS for
> basic acceptance is I think serious infringement on our design goal #4 that
> says XML shall be easy to program. Also, including 8859 but not JIS is
> disturbingly Eurocentric.
Maybe all that is needed (in our blissful absense of complicating SGML
declarations or non-RCS DTDs) is that
1) the first thing in the document before any non-ISO 646 characters is a
PI with only ISO 646 characters that can say the encoding (if it is exotic
or warranted). E.g.:
2) the encoding used for the input stream must have ISO 646 characters
in the same code numbers as ISO 646.
I don't know any standard national encodings (apart from EBCDIC and maybe
some exoteric Asian ones that are not used for HTML anyway and so don't
count) that aren't OK with this scheme (Gavin or James would know
better?). It certain fits ISO 8859-n, EUC, shiftJIS, UTF8 and so on. It
fits the fixed width 16-bit ones too, in that a reader can detect whether
16 bit is being used from the zero-value octets (remember, no non-1S0 646
characters before the XML PI).
To put this another way: it can work with any fixed 8bit encoding that
has ISO 646 in the bottom half of the character code. It works for any
variable-byte encoding that uses a single octet for ISO646 characters
and has ISO 646 in the bottom half of those codes. And it works for
any fixed 16bit encoding (in either order) as long as the ISO 646 characters
are in the appropriate place in the lowest 127 code positions.
Rick Jelliffe http://www.allette.com.au/allette/ricko
Allette Systems http://www.allette.com.au
10/91 York St, 2000, phone: +61 2 9262 4777
Sydney, Australia fax: +61 2 9262 4774