- From: Dave Peterson <davep@acm.org>
- Date: Wed, 11 Jun 1997 21:19:11 -0400
- To: w3c-sgml-wg@w3.org
At 8:05 PM 6/11/97, Gavin Nicol wrote: >I would favor using ISO 10646 as coded character set to use for the >SGML declaration for XML, and to specify that the character >*repertiore* available within XML, is that of ISO 10646. I could be >convinced to line up with Unicode in this regard. A character is represented indirectly via a numeric character reference using a single numeral per character. It only makes sense to represent high-order 10646 characters via a single long numeral, such as up to eight digits hex. >However, I most certainly do *NOT* think that we have any business >defining what the processor hands back. This is purely an >implementation issue, and not one that belongs in XML-lang. I can >return a stream of 31 bit character coded in any number of different >encodings. I might return then as UTF in my application, or as UCS, or >as a string encoded using hex digits. > >There is one more issue, and that is the question of how the >application represents/interprets characters. I personally like to >view characters as a purely abstract object, thereby leaving the >widest possible choice of implementation strategies, though this does >not seem to be the model favoured by SGML (this *is* the model for >HTML). In fact, this *is* the "new" SGML model. Personally, I'd like to see it made official with the TC, not even waiting for the revision. As you say, it's the model for HTML--which is one reason that the "new" SGML model came up for discussion in the first place. It's highly appropriate. (For the record, the "new" SGML model *permits* you to use the document character set to describe storage and processing representations, but does not require it.) I heartily agree that we should not be prescribing the representation of characters used internally within a software system, including between its components (like between the XML-processor and an application coupled thereto). Dave Peterson SGMLWorks! davep@acm.org
Received on Wednesday, 11 June 1997 21:19:21 UTC