W3C home > Mailing lists > Public > xml-editor@w3.org > October to December 1998

XML documents in false UTF-8

From: MURATA Makoto <murata@apsdc.ksp.fujixerox.co.jp>
Date: Wed, 02 Dec 1998 10:52:25 +0900
Message-Id: <199812020152.AA02811@murata.apsdc.ksp.fujixerox.co.jp>
To: xml-editor@w3.org
Consider false UTF-8 which mistakenly represents a non-BMP character 
(from 0800 to 10FFF) by SIX bytes.  Such false UTF-8 may be created by 
applying UCS-2 -> UTF-8 converters to UTF-16 containing surrogate pairs.
Java uses such "UTF-8" internally and also for class files.

Is an XML document in such "UTF-8" a error or fatal error?  Since 
some code conversion libraries automatically fix such bad UTF-8, this 
should probably be an error rather than a fatal error.  Otherwise, XML 
parses based on such libraries are non-conformant.


Fuji Xerox Information Systems
Tel: +81-44-812-7230   Fax: +81-44-812-7231
E-mail: murata@apsdc.ksp.fujixerox.co.jp
Received on Tuesday, 1 December 1998 20:46:41 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:37:39 UTC