- From: <bugzilla@jessica.w3.org>
- Date: Thu, 03 Feb 2011 20:16:27 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11973 --- Comment #3 from Craig S <craig.e.shea@gmail.com> 2011-02-03 20:16:26 UTC --- I was doing a lot of reading, and this is the best I could explain it. I took a Word document and used the File->Save As feature to save the document as Html (filtered), which removes all the MS-specific XML namespace stuff and sticks to traditional HTML. As we all know, Word replaces the standard apostrophe and double-quotes with "curly" versions. Now, when I looked at the hexadecimal value of a right single quote as stored in the document, it had the following value: 0xe2, 0x80, 0x99 (which shows up as lower-case a with a caron, a Euro currency symbol, and the trademark symbol, in the dump viewer). Now, this is a UTF-8 encoding for a right single quote. However, in my web browser (IE9 beta), it shows up as a '?'. Now, if I actually specify in a META element http-equiv=Content-Type content="text/html; charset=windows-1252", then the page is displayed correctly with the correct character, even though that character is still encoded with the 3 bytes shown above. Perhaps I misunderstood the problem, however, from what I can see, Word uses the windows-1252 character set, and when I send the charset=windows-1252 over to the UA, it displays correctly. As far as I know, windows-1252 does not necessarily need to be encoded in UTF-8. It could just as easily use ASCII encoding. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Thursday, 3 February 2011 20:16:29 UTC