Re: Concrete syntax, character sets

> * Notwithstanding the last point, we can (if we want) go farther than
> saying that 10646 is the XML character set (in the SGML sense) and
> also say that UTF-8 is the only (or recommended, or expected, or
> default) encoding that we shall/should/expect to find when we open a
> file containing an XML document or receive a byte stream during an
> HTTP session.
> 
> Right?
> 
> Jon


This sounds good as long as UTF-8 is recommended or defaulted.  The
other issue is what to present at the parser's API.  Here to I do
not want to be restricted to a UTF-8 encoding because if I were to
write a parser in JAVA, a UTF-16 encoding would be more appropriate.
 Instead I view the parser proper as excepting UCS-4 on input and
output.  Encoding should be handled outside the parser in the storage
manager.




B. Todd Bauman
Graduate Student
University of Maryland, Baltimore County

Received on Wednesday, 11 September 1996 07:24:05 UTC