- From: Gavin Nicol <gtn@ebt.com>
- Date: Tue, 10 Sep 1996 14:31:27 GMT
- To: bosak@atlantic-83.Eng.Sun.COM
- CC: tbray@textuality.com, w3c-sgml-wg@w3.org
>Having just gone through a big struggle in WG8 and X3V1 over the ERCS >proposal, I would feel pretty strange about limiting markup to >something that not even Western Europeans could use the way they want >to. I would like to see some serious discussion of this point. I would feel somewhat strange about not supporting native language markup, particularly as we're going to have to use a variant concrete syntax to support native language content. It seems to me that the most reasonable thing to do would be to decide upon a syntax that used ISO 10646 for both data and markup... We had this same discussion in HTML-WG, and I pushed for a syntax that used ISO 10646 as the document character set. This, and other discussion led to the HTML I18N draft, which is moving towards proposed standard (and it'll probably be adopted by W3C in some HTML revision). It seems that in the interest of compatibility, we should have a similar concrete syntax, though with an extended markup character repertoire. The ERCS work that Rick did is very important, and I do not think it is a great burden for XML browsers to support it, at least to a minimal degree. In fact, given that we also have content negotiation in the WWW, and that HTTP 1.1 is becoming somewhat stricter on content labelling requirements, XML browsers would not need to support any encodings other than those deemed important by the companies producing them. >It's certainly thinkable to me. Is it thinkable to say that "all >markup is in UTF8" as well? No, it's not, because then you'd also require all content to be in UTF8, and many users have no way of creating such data. In the best cases, producing such data usually involves a conversion somwhere. Again, we had the same discussion in HTML-WG. There are many good reasons for selecting a single document character set, and then just looking upon SJIS and whatnot as encodings.
Received on Tuesday, 10 September 1996 10:32:30 UTC