More encoding problems - japanese test cases from Karl Waclawek on 2004-04-20 (public-xml-testsuite@w3.org from April 2004)

From: Karl Waclawek <karl@waclawek.net>
Date: Tue, 20 Apr 2004 10:38:13 -0400
To: <public-xml-testsuite@w3.org>
Message-ID: <004401c426e5$1c5252e0$9e539696@citkwaclaww2k>

1) Test case japanese\pr-xml-little-endian.xml:

AElfred, Expat and MSXML compain about this document.
Right after the XML declaration there is a sequence (hex): 0x0d 0x0a 0x00 0x0d 0x0a 0x00
Should that not be 0x0d 0x00 0x0a 0x00 0x0d 0x00 0x0a 0x00 ?
Expat and MSXML complain about line 1, column 22, but maybe only because
they assume it is UTF-8 since there is no encoding declaration.
A separate Unicode checker I am using complains about a low surrogate
0xdd30 without a high surrogate further down, when I tell it to check for UTF-16.

2) Test case japanese\pr-xml-utf-16.xml:

Very similar to the first case - document is in UTF-16, big endian.
Expat and MSXML complain about line2, column 0.
The sequence right after the XML declaration is: 0x00 0x0d 0x0a 0x00 0x0d 0x0a 0x00 0x3c
A separate Unicode checker I am using complains about a low surrogate
0xdd30 without a high surrogate further down, when I tell it to check for UTF-16.

3) Test case japanese\weekly-little-endian:
4) Test case japanese\weekly-utf-16.xml/dtd:

Basically the same as 1) and 2). Also fail the Unicode checker in similar ways.

Regards,

Karl

Received on Tuesday, 20 April 2004 10:46:30 UTC