- From: <bbauma1@cs.umbc.edu>
- Date: Tue, 10 Sep 1996 22:34:45 +0000
- To: w3c-sgml-wg@w3.org
Although, I have a fairly limited grasp on ISO 10646 / UNICODE Ver. 2 I do have experience encoding TEI documents in a variety of unusual languages, and as such I strongly support Gavin's ideas. I believe that unless a strong case can be made to the difficulty of implementation, standardizing on the ISO 10646 character repertoire and not a particular encoding of that repertoire, to be the best choice. I agree with Tim Bray in that I do not want the XML parser to be responsible for encoding / decoding a particular character representation, and as such do not want UTF-8 to be specified as its internal format. As a matter of practicality, I realize that UCS-4 is not something that one would want to send across the "wire" and in that sense specifying that UTF-8 be the the initial encoding for transmission is fine. (Although the Reuters encoding scheme that I just read about in the UNICODE conference proceedings is very intriguing.) I also agree with Tim that a library of encoding conversion routines that fuction outside of the parser would be very useful in a reference implementation. Let me also weigh in on the side of having all markup and content in the same character set including GI's and attributes. Finally, I am a little concerned with the partitioning of functionality between XML and SGML. In particular, if XML becomes as successful as I hope, there may be strong reasons (i.e. product availability / price) for not using SGML and making due with XML. XML therefore cannot rely too heavily on the availability of its more functional parent to mitigate its own limitations. B. Todd Bauman Graduate Student University of Maryland, Baltimore County
Received on Tuesday, 10 September 1996 22:43:01 UTC