- From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
- Date: Wed, 02 Dec 1998 10:52:25 +0900
- To: xml-editor@w3.org
Consider false UTF-8 which mistakenly represents a non-BMP character (from 0800 to 10FFF) by SIX bytes. Such false UTF-8 may be created by applying UCS-2 -> UTF-8 converters to UTF-16 containing surrogate pairs. Java uses such "UTF-8" internally and also for class files. Is an XML document in such "UTF-8" a error or fatal error? Since some code conversion libraries automatically fix such bad UTF-8, this should probably be an error rather than a fatal error. Otherwise, XML parses based on such libraries are non-conformant. Cheers, Makoto Fuji Xerox Information Systems Tel: +81-44-812-7230 Fax: +81-44-812-7231 E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Tuesday, 1 December 1998 20:46:41 UTC