W3C home > Mailing lists > Public > public-qt-comments@w3.org > August 2006

[Bug 3550] expected result for ns-queries-results-q5

From: <bugzilla@wiggum.w3.org>
Date: Tue, 01 Aug 2006 13:05:53 +0000
CC:
To: public-qt-comments@w3.org
Message-Id: <E1G7twX-0004pa-E4@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=3550





------- Comment #3 from mike@saxonica.com  2006-08-01 13:05 -------
UltraEdit in hex mode shows the character as C3 A4, but when I read the file
into a Java InputStream and display the bytes I do indeed get 

c3 83 c2 a4

That's actually the UTF-8 encoding of C3 A4, which is the UTF-8 encoding of E4.
So it's been doubly-encoded into UTF-8. I got confused by UltraEdit - in hex
mode it doesn't actually show the octets present in the file, it shows the
UTF-16 characters after decoding from UTF-8

I'm seeing the same byte sequence in the result file produced by Saxon, so I
suspect this might be the cause of the problem. Perhaps I supplied a result
file at some stage and this was incorporated into the distribution. I suspect
this double-encoding is happening as a result of the way I do canonicalization
- as it's done to both files it doesn't normally show up.
Received on Tuesday, 1 August 2006 13:06:07 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 27 March 2012 18:14:44 GMT