Although, I have a fairly limited grasp on ISO 10646 / UNICODE Ver. 2 I do have experience encoding TEI documents in a variety of unusual languages, and as such I strongly support Gavin's ideas. I believe that unless a strong case can be made to the difficulty of implementation, standardizing on the ISO 10646 character repertoire and not a particular encoding of that repertoire, to be the best choice. I agree with Tim Bray in that I do not want the XML parser to be responsible for encoding / decoding a particular character representation, and as such do not want UTF-8 to be specified as its internal format. As a matter of practicality, I realize that UCS-4 is not something that one would want to send across the "wire" and in that sense specifying that UTF-8 be the the initial encoding for transmission is fine. (Although the Reuters encoding scheme that I just read about in the UNICODE conference proceedings is very intriguing.) I also agree with Tim that a library of encoding conversion routines that fuction outside of the parser would be very useful in a reference implementation. Let me also weigh in on the side of having all markup and content in the same character set including GI's and attributes. Finally, I am a little concerned with the partitioning of functionality between XML and SGML. In particular, if XML becomes as successful as I hope, there may be strong reasons (i.e. product availability / price) for not using SGML and making due with XML. XML therefore cannot rely too heavily on the availability of its more functional parent to mitigate its own limitations. B. Todd Bauman Graduate Student University of Maryland, Baltimore CountyReceived on Tuesday, 10 September 1996 22:43:01 EDT
This archive was generated by hypermail pre-2.1.9 : Wednesday, 24 September 2003 10:03:20 EDT