[Prev][Next][Index][Thread]

Re: Concrete syntax, character sets



> * Notwithstanding the last point, we can (if we want) go farther than
> saying that 10646 is the XML character set (in the SGML sense) and
> also say that UTF-8 is the only (or recommended, or expected, or
> default) encoding that we shall/should/expect to find when we open a
> file containing an XML document or receive a byte stream during an
> HTTP session.
> 
> Right?
> 
> Jon


This sounds good as long as UTF-8 is recommended or defaulted.  The
other issue is what to present at the parser's API.  Here to I do
not want to be restricted to a UTF-8 encoding because if I were to
write a parser in JAVA, a UTF-16 encoding would be more appropriate.
 Instead I view the parser proper as excepting UCS-4 on input and
output.  Encoding should be handled outside the parser in the storage
manager.




B. Todd Bauman
Graduate Student
University of Maryland, Baltimore County