- From: Gavin Nicol <gtn@ebt.com>
- Date: Tue, 10 Sep 1996 19:35:19 GMT
- To: srn@techno.com
- CC: w3c-sgml-wg@w3.org
>I agree to the extent that XML as defined right now should use a >hardwired concrete syntax, but to force only one such syntax is asking >for obsolescence. It depends. If you go for a 32 bit character repertoire (10646) for the document character set, then you'll be fine for the forseeable future. >There needs to be a way to specify 'versions' of the concrete syntax >used, where you might have a 7-bit ascii version and a Unicode >version, etc. I disagree. This can be handled quite adequately by the content negotiation mechanisms of the WWW. Also, different syntaxes means that numeric character references and other such things become dependent upon a given syntax, which could be a pain in email, and in translation servers. >As it stands now, there are very few tools which support portable >text beyond 7-bit ascii in any reliable way. Given this framework, I >think XML should start with ascii, as a base. Part of the whole >concept here, as I saw it, was that I could fire up vi or notepad and >view a document. (Though I might not enjoy doing it.) That paradigm >breaks if XML tries to leap-frog currently used technology too much. The fact that you use 32 bits for the document character set does not mean that you must use 32 bits internally. You can fake it by either using UTF-8 internally, or by restricting the acceptable input to 7 bit data via content negotiation. So long as the perser behaves in a conformant manner with the data that it takes in, all will be well. There are issues with numeric charcater references, SDATA, and other such constructs, ,but a) for portability, one needs to be careful here anyway, and b) SGML doesn't define exactly what a parser/application should do with data that it cannot dispaly, so any recovery scheme is acceptable.
Received on Tuesday, 10 September 1996 15:36:31 UTC