[Prev][Next][Index][Thread]

Re: Concrete syntax, character sets



Time Bray wrote:

> 1. Document *data* is (mostly) for people to read, and thus of course 
>    has to support the languages they write in.  Document *markup* is
>    (mostly) for computer programs to read, plus the occasional unfortunate
>    document designer.  Given that these things are already monocased,
>    and by industry habit that I doubt XML will break, short, it's not
>    clear that expressing GI's & attribute names in Cyrillic or Chinese is all 
>    that important to the market.

Ask the question to the Cyrillic, Chinese, and other markets that can't live with 
7-bit ASCII and I suspect you will get a very different answer. XML should 
*embrace* I18N not merely make it possible. 

> 2. Supporting bigger & more complex encodings in markup brings the benefit
>    of making life easier & friendlier for document designers who want to
>    use them.  Restricting the markup character set down to 7 bits brings
>    the benefit of making it quicker & easier to generate software that
>    processes such markup.  If I didn't already think that the second 
>    of these two incompatible benefits was more important, I wouldn't
>    be working on XML.

I don't see these as incompatible but rather as complementary requirements. XML 
should be both easy to use (document designers authors) and easy to implement 
(software developers). I don't see needing to trade them off, at least in this 
instance. 

We are agreed that 7-bit ASCII isn't sufficient for XML data. This will require 
significant effort to (properly) support. Suuprting non-7-bit ASCII in markup is 
trvial by comparison.

What about the UTF8 suggestion for both markup and data?