Re: Concrete syntax, character sets from Gavin Nicol on 1996-09-10 (w3c-sgml-wg@w3.org from September 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Tue, 10 Sep 1996 14:45:41 GMT
To: tbray@textuality.com
CC: w3c-sgml-wg@w3.org
Message-Id: <199609101445.OAA00766@wiley.EBT.COM>

>1. Document *data* is (mostly) for people to read, and thus of course 
>   has to support the languages they write in.  Document *markup* is
>   (mostly) for computer programs to read, plus the occasional unfortunate
>   document designer.  Given that these things are already monocased,
>   and by industry habit that I doubt XML will break, short, it's not
>   clear that expressing GI's & attribute names in Cyrillic or Chinese is all 
>   that important to the market.

For document designers, my experience has been that about 50% or the
Japanese people I talk to wish for Japanese markup. The people who are
happy with ASCII markup, usually feel that it is better for
interoperability. However, the people who want native language markup
usually cite usability as the prime reason: it is much more
understandable to have "bunsho" in a Japanese document, and in
stylesheets, it becomes even more desireable, they say.

For Japanese, it is not an overly large problem, because they have a
phonetic spelling of Japanese that uses ASCII (romanji), but for other
languages, ASCII phonetics as markup don't win.

>2. Supporting bigger & more complex encodings in markup brings the benefit
>   of making life easier & friendlier for document designers who want to
<   use them.  Restricting the markup character set down to 7 bits brings
>   the benefit of making it quicker & easier to generate software that
>   processes such markup.  If I didn't already think that the second 
>   of these two incompatible benefits was more important, I wouldn't
>   be working on XML.

This is a fallacy. If you are going to support native language
content, you will have to have some way of decoding the octet stream
in order to correvtly parse the document (otherwise you run into
problems with bits of character codes that could be mistaken for
markup). 

If you have a decoding module on the stream (or a bit combination
transformation filter), you will also be able to support native
language markup.

Received on Tuesday, 10 September 1996 10:46:45 UTC