Re: Concrete syntax, character sets
Time Bray wrote:
> 1. Document *data* is (mostly) for people to read, and thus of course
> has to support the languages they write in. Document *markup* is
> (mostly) for computer programs to read, plus the occasional unfortunate
> document designer. Given that these things are already monocased,
> and by industry habit that I doubt XML will break, short, it's not
> clear that expressing GI's & attribute names in Cyrillic or Chinese is all
> that important to the market.
Ask the question to the Cyrillic, Chinese, and other markets that can't live with
7-bit ASCII and I suspect you will get a very different answer. XML should
*embrace* I18N not merely make it possible.
> 2. Supporting bigger & more complex encodings in markup brings the benefit
> of making life easier & friendlier for document designers who want to
> use them. Restricting the markup character set down to 7 bits brings
> the benefit of making it quicker & easier to generate software that
> processes such markup. If I didn't already think that the second
> of these two incompatible benefits was more important, I wouldn't
> be working on XML.
I don't see these as incompatible but rather as complementary requirements. XML
should be both easy to use (document designers authors) and easy to implement
(software developers). I don't see needing to trade them off, at least in this
We are agreed that 7-bit ASCII isn't sufficient for XML data. This will require
significant effort to (properly) support. Suuprting non-7-bit ASCII in markup is
trvial by comparison.
What about the UTF8 suggestion for both markup and data?