Re: Concrete syntax, character sets from Gavin Nicol on 1996-09-10 (w3c-sgml-wg@w3.org from September 1996)

From: Gavin Nicol <gtn@ebt.com>
Date: Tue, 10 Sep 1996 19:35:19 GMT
To: srn@techno.com
CC: w3c-sgml-wg@w3.org
Message-Id: <199609101935.TAA03934@wiley.EBT.COM>

>I agree to the extent that XML as defined right now should use a
>hardwired concrete syntax, but to force only one such syntax is asking
>for obsolescence. 

It depends. If you go for a 32 bit character repertoire (10646) for
the document character set, then you'll be fine for the forseeable
future. 

>There needs to be a way to specify 'versions' of the concrete syntax
>used, where you might have a 7-bit ascii version and a Unicode
>version, etc. 

I disagree. This can be handled quite adequately by the content
negotiation mechanisms of the WWW. Also, different syntaxes means that
numeric character references and other such things become dependent
upon a given syntax, which could be a pain in email, and in
translation servers.

>As it stands now, there are very few tools which support portable
>text beyond 7-bit ascii in any reliable way.  Given this framework, I
>think XML should start with ascii, as a base.  Part of the whole
>concept here, as I saw it, was that I could fire up vi or notepad and
>view a document.  (Though I might not enjoy doing it.)  That paradigm
>breaks if XML tries to leap-frog currently used technology too much.

The fact that you use 32 bits for the document character set does not
mean that you must use 32 bits internally. You can fake it by either
using UTF-8 internally, or by restricting the acceptable input to 7
bit data via content negotiation. So long as the perser behaves in a
conformant manner with the data that it takes in, all will be well.

There are issues with numeric charcater references, SDATA, and other
such constructs, ,but a) for portability, one needs to be careful here
anyway, and b) SGML doesn't define exactly what a parser/application
should do with data that it cannot dispaly, so any recovery scheme is
acceptable.

Received on Tuesday, 10 September 1996 15:36:31 UTC