XML documents in false UTF-8

Consider false UTF-8 which mistakenly represents a non-BMP character 
(from 0800 to 10FFF) by SIX bytes.  Such false UTF-8 may be created by 
applying UCS-2 -> UTF-8 converters to UTF-16 containing surrogate pairs.
Java uses such "UTF-8" internally and also for class files.

Is an XML document in such "UTF-8" a error or fatal error?  Since 
some code conversion libraries automatically fix such bad UTF-8, this 
should probably be an error rather than a fatal error.  Otherwise, XML 
parses based on such libraries are non-conformant.

Cheers,

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Tuesday, 1 December 1998 20:46:41 UTC