XML documents in false UTF-8 from MURATA Makoto on 1998-12-02 (xml-editor@w3.org from October to December 1998)

From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Date: Wed, 02 Dec 1998 10:52:25 +0900
To: xml-editor@w3.org
Message-Id: <199812020152.AA02811@murata.apsdc.ksp.fujixerox.co.jp>

Consider false UTF-8 which mistakenly represents a non-BMP character 
(from 0800 to 10FFF) by SIX bytes.  Such false UTF-8 may be created by 
applying UCS-2 -> UTF-8 converters to UTF-16 containing surrogate pairs.
Java uses such "UTF-8" internally and also for class files.

Is an XML document in such "UTF-8" a error or fatal error?  Since 
some code conversion libraries automatically fix such bad UTF-8, this 
should probably be an error rather than a fatal error.  Otherwise, XML 
parses based on such libraries are non-conformant.

Cheers,

Makoto
 
Fuji Xerox Information Systems
 
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp

Received on Tuesday, 1 December 1998 20:46:41 UTC