- From: <bugzilla@wiggum.w3.org>
- Date: Sun, 17 Oct 2004 22:58:16 +0000
- To: www-validator-cvs@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=921 Summary: Validator inserts text in the middle of a UTF-8 character Product: Validator Version: 0.6.7 Platform: All URL: http://validator.w3.org/check?uri=http://forum.druzya.or g OS/Version: All Status: NEW Severity: normal Priority: P2 Component: check AssignedTo: link@pobox.com ReportedBy: bdew@bdew.yi.org QAContact: www-validator-cvs@w3.org I've discovered a bug in the checker, i tried to check my site (http://forum.druzya.org) and the first error looked broken, something like: (that's what appeared on my screen, as you can see it's broken and some HTML code produced by the validator leaks to the screen) ...663b" title="Список форуммstrong title="Position where error was detected.">? в Друзей" /> here's a hex dump of the html that pruduced this: 0 1 2 3 4 5 6 7 8 9 A B C D E F 000: 3C 6C 69 3E 3C 70 3E 3C 65 6D 3E 4C 69 6E 65 20 <li><p><em>Line 010: 37 2C 20 63 6F 6C 75 6D 6E 20 31 30 33 3C 2F 65 7, column 103</e 020: 6D 3E 3A 20 3C 73 70 61 6E 20 63 6C 61 73 73 3D m>: <span class= 030: 22 6D 73 67 22 3E 63 68 61 72 61 63 74 65 72 20 "msg">character 040: 64 61 74 61 20 69 73 20 6E 6F 74 20 61 6C 6C 6F data is not allo 050: 77 65 64 20 68 65 72 65 3C 2F 73 70 61 6E 3E 3C wed here</span>< 060: 2F 70 3E 3C 70 3E 3C 63 6F 64 65 20 63 6C 61 73 /p><p><code clas 070: 73 3D 22 69 6E 70 75 74 22 3E 2E 2E 2E 63 36 64 s="input">...c6d 080: 36 26 23 33 34 3B 20 74 69 74 6C 65 3D 26 23 33 6" title= 090: 34 3B D0 A1 D0 BF D0 B8 D1 81 D0 BE D0 BA 20 D1 4;РЎРїРёС_Р_Рє С 0A0: 84 D0 BE D1 80 D1 83 D0 BC D0 3C 73 74 72 6F 6E "Р_С_С_Р_Р<stron As you can see, it's utf8 and at 0x0A9 there is a beginning of a utf-8 character that's got broken into two by the message. The first char of the message html ("<") got processed as the second byte of that character. 0B0: 67 20 74 69 74 6C 65 3D 22 50 6F 73 69 74 69 6F g title="Positio 0C0: 6E 20 77 68 65 72 65 20 65 72 72 6F 72 20 77 61 n where error wa 0D0: 73 20 64 65 74 65 63 74 65 64 2E 22 3E BE 3C 2F s detected.">_</ At position 0x0DD seems to be the character that the checker complains about, and i don't see anything bad in it so probably it's a bug too. 0E0: 73 74 72 6F 6E 67 3E D0 B2 20 D0 94 D1 80 D1 83 strong>Р_ Р"С_С_ 0F0: D0 B7 D0 B5 D0 B9 26 23 33 34 3B 20 2F 26 23 36 Р·РчР№" / 100: 32 3B 3C 2F 63 6F 64 65 3E 3C 2F 70 3E 2;</code></p> [END OF HEXDUMP] This how this looked on the original file: (It was encoded with CP-1251, the recoding to UTF8 was done by the checker) 0 1 2 3 4 5 6 7 8 9 A B C D E F 000: 3C 6C 69 6E 6B 20 72 65 6C 3D 22 74 6F 70 22 20 <link rel="top" 010: 68 72 65 66 3D 22 2E 2F 69 6E 64 65 78 2E 70 68 href="./index.ph 020: 70 3F 73 69 64 3D 30 34 34 38 62 32 66 62 61 63 p?sid=0448b2fbac 030: 38 38 66 31 65 39 62 31 66 35 65 65 39 37 36 63 88f1e9b1f5ee976c 040: 64 65 30 36 38 32 22 20 74 69 74 6C 65 3D 22 D1 de0682" title="╤ 050: EF E8 F1 EE EA 20 F4 EE F0 F3 EC EE E2 20 C4 F0 яшёюъ ЇюЁєьют ─Ё It seems that it barfed at 0x05B, as i said i see nothing bad about this character whatsoever. 060: F3 E7 E5 E9 22 20 2F 3E єчхщ" /> That's all, i hope that my bugreport helps (and that it won't be corrupted because of all those chars :) ------- You are receiving this mail because: ------- You are the QA contact for the bug, or are watching the QA contact.
Received on Sunday, 17 October 2004 22:58:17 UTC