- From: Phillips, Addison <addison@amazon.com>
- Date: Tue, 10 Jun 2008 07:33:04 -0700
- To: Martin Duerst <duerst@it.aoyama.ac.jp>, Richard Ishida <ishida@w3.org>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Martin wrote: > > > >-- > >This means that XML or HTML documents are always processed as a > sequence of > >characters from the Unicode character set. > >-- > > This may not always be true. It is perfectly fine to have an > XML parser that works in US-ASCII for US-ASCII documents, and > so on. It may not be a good idea in terms of implementation, > but it wouldn't be against the XML Rec. > (personal response) Yes, but the effect is the same: a US-ASCII document might still contain an NCR that must be treated as a Unicode code point. It is useful to note that the paragraph directly following this sentence makes the point that the file might use any encoding, including a non-Unicode encoding. While my suggestion might not be quite the right wording, it does, I think, convey the important point, which is that document authors may (and document processors must) treat files as if they were a sequence of Unicode code points. What encoding the processor uses internally is invisible. Addison
Received on Tuesday, 10 June 2008 14:33:42 UTC