AW: Schema in UTF-8 instance in UTF-16? from Horti, Andras on 2003-08-28 (xmlschema-dev@w3.org from August 2003)

From: Horti, Andras <andras.horti@joanneum.at>
Date: Thu, 28 Aug 2003 14:38:24 +0200
To: <xmlschema-dev@w3.org>
Message-ID: <56512F1841F5E846879EE0D9F858D20F5B29E1@RZJC1EX.jr1.local>

As long as they are in separate files should not be a problem. At least for Xerces-JAVA. During parsing everything is read into JAVA-Strings which are always UTF-16. It means that no matter what the encoding of a file is, the parser works internally always with UTF-16 .
 
Hope it helps
 
Andras Horti

-----Ursprüngliche Nachricht-----
Von: Ewa Iwicka [mailto:iwicka@ean-int.org]
Gesendet: Donnerstag, 28. August 2003 14:21
An: 'xmlschema-dev@w3.org'
Betreff: Schema in UTF-8 instance in UTF-16?



Hello Everybody, 

I'm a member of a group developing standard XML schemas (in English), based on which, members worldwide develop instance documents, according to their needs. Worldwide means populating them with data in various languages, scripts etc. For some languages the more suitable choice of encoding would be UTF-16, but our in schemas we use UTF-8. Recently we are struggling with the following question: would it be possible to use both, i.e. a schema encoded in UTF-8 and an instance in UTF-16? That would mean that tags are encoded in UTF-8 and data (attribute values and element contents) in UTF-16. Is it possible to parse such a document? Is there any mechanism to facilitate that or to 'switch' between UTF sets at the XML level?

I would appreciate any suggestions or references 

Thank you 

Ewa Iwicka

Received on Thursday, 28 August 2003 08:38:32 UTC