[Prev][Next][Index][Thread]
Re: Concrete syntax, character sets
>This sounds good as long as UTF-8 is recommended or defaulted. The
>other issue is what to present at the parser's API. Here to I do
>not want to be restricted to a UTF-8 encoding because if I were to
>write a parser in JAVA, a UTF-16 encoding would be more appropriate.
> Instead I view the parser proper as excepting UCS-4 on input and
>output. Encoding should be handled outside the parser in the storage
>manager.
Quite correct. The parser should have an "ideal/logical" interface
that uses UCS-4, but in reality, it may use an internal representation
of any bit width it chooses.
References: