[Bug 3550] expected result for ns-queries-results-q5 from bugzilla@wiggum.w3.org on 2006-08-01 (public-qt-comments@w3.org from August 2006)

From: <bugzilla@wiggum.w3.org>
Date: Tue, 01 Aug 2006 13:05:53 +0000
To: public-qt-comments@w3.org
CC:
Message-Id: <E1G7twX-0004pa-E4@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3550





------- Comment #3 from mike@saxonica.com  2006-08-01 13:05 -------
UltraEdit in hex mode shows the character as C3 A4, but when I read the file
into a Java InputStream and display the bytes I do indeed get 

c3 83 c2 a4

That's actually the UTF-8 encoding of C3 A4, which is the UTF-8 encoding of E4.
So it's been doubly-encoded into UTF-8. I got confused by UltraEdit - in hex
mode it doesn't actually show the octets present in the file, it shows the
UTF-16 characters after decoding from UTF-8

I'm seeing the same byte sequence in the result file produced by Saxon, so I
suspect this might be the cause of the problem. Perhaps I supplied a result
file at some stage and this was incorporated into the distribution. I suspect
this double-encoding is happening as a result of the way I do canonicalization
- as it's done to both files it doesn't normally show up.

Received on Tuesday, 1 August 2006 13:06:07 UTC