- From: <bugzilla@wiggum.w3.org>
- Date: Sun, 17 Oct 2004 22:58:16 +0000
- To: www-validator-cvs@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=921
Summary: Validator inserts text in the middle of a UTF-8
character
Product: Validator
Version: 0.6.7
Platform: All
URL: http://validator.w3.org/check?uri=http://forum.druzya.or
g
OS/Version: All
Status: NEW
Severity: normal
Priority: P2
Component: check
AssignedTo: link@pobox.com
ReportedBy: bdew@bdew.yi.org
QAContact: www-validator-cvs@w3.org
I've discovered a bug in the checker, i tried to check my site
(http://forum.druzya.org) and the first error looked broken, something like:
(that's what appeared on my screen, as you can see it's broken and some HTML
code produced by the validator leaks to the screen)
...663b" title="Список форуммstrong title="Position where error was detected.">?
в Друзей" />
here's a hex dump of the html that pruduced this:
0 1 2 3 4 5 6 7 8 9 A B C D E F
000: 3C 6C 69 3E 3C 70 3E 3C 65 6D 3E 4C 69 6E 65 20 <li><p><em>Line
010: 37 2C 20 63 6F 6C 75 6D 6E 20 31 30 33 3C 2F 65 7, column 103</e
020: 6D 3E 3A 20 3C 73 70 61 6E 20 63 6C 61 73 73 3D m>: <span class=
030: 22 6D 73 67 22 3E 63 68 61 72 61 63 74 65 72 20 "msg">character
040: 64 61 74 61 20 69 73 20 6E 6F 74 20 61 6C 6C 6F data is not allo
050: 77 65 64 20 68 65 72 65 3C 2F 73 70 61 6E 3E 3C wed here</span><
060: 2F 70 3E 3C 70 3E 3C 63 6F 64 65 20 63 6C 61 73 /p><p><code clas
070: 73 3D 22 69 6E 70 75 74 22 3E 2E 2E 2E 63 36 64 s="input">...c6d
080: 36 26 23 33 34 3B 20 74 69 74 6C 65 3D 26 23 33 6" title=
090: 34 3B D0 A1 D0 BF D0 B8 D1 81 D0 BE D0 BA 20 D1 4;РЎРїРёС_Р_Рє С
0A0: 84 D0 BE D1 80 D1 83 D0 BC D0 3C 73 74 72 6F 6E "Р_С_С_Р_Р<stron
As you can see, it's utf8 and at 0x0A9 there is a beginning of a utf-8
character that's got broken into two by the message. The first char of the
message html ("<") got processed as the second byte of that character.
0B0: 67 20 74 69 74 6C 65 3D 22 50 6F 73 69 74 69 6F g title="Positio
0C0: 6E 20 77 68 65 72 65 20 65 72 72 6F 72 20 77 61 n where error wa
0D0: 73 20 64 65 74 65 63 74 65 64 2E 22 3E BE 3C 2F s detected.">_</
At position 0x0DD seems to be the character that the checker complains about,
and i don't see anything bad in it so probably it's a bug too.
0E0: 73 74 72 6F 6E 67 3E D0 B2 20 D0 94 D1 80 D1 83 strong>Р_ Р"С_С_
0F0: D0 B7 D0 B5 D0 B9 26 23 33 34 3B 20 2F 26 23 36 Р·РчР№" /
100: 32 3B 3C 2F 63 6F 64 65 3E 3C 2F 70 3E 2;</code></p>
[END OF HEXDUMP]
This how this looked on the original file: (It was encoded with CP-1251, the
recoding to UTF8 was done by the checker)
0 1 2 3 4 5 6 7 8 9 A B C D E F
000: 3C 6C 69 6E 6B 20 72 65 6C 3D 22 74 6F 70 22 20 <link rel="top"
010: 68 72 65 66 3D 22 2E 2F 69 6E 64 65 78 2E 70 68 href="./index.ph
020: 70 3F 73 69 64 3D 30 34 34 38 62 32 66 62 61 63 p?sid=0448b2fbac
030: 38 38 66 31 65 39 62 31 66 35 65 65 39 37 36 63 88f1e9b1f5ee976c
040: 64 65 30 36 38 32 22 20 74 69 74 6C 65 3D 22 D1 de0682" title="╤
050: EF E8 F1 EE EA 20 F4 EE F0 F3 EC EE E2 20 C4 F0 яшёюъ ЇюЁєьют ─Ё
It seems that it barfed at 0x05B, as i said i see nothing bad about this
character whatsoever.
060: F3 E7 E5 E9 22 20 2F 3E єчхщ" />
That's all, i hope that my bugreport helps (and that it won't be corrupted
because of all those chars :)
------- You are receiving this mail because: -------
You are the QA contact for the bug, or are watching the QA contact.
Received on Sunday, 17 October 2004 22:58:17 UTC