- From: Terje Bless <link@pobox.com>
- Date: Tue, 29 Nov 2005 15:49:26 +0100
- To: www-validator@w3.org
- cc: Dag Øystein Johansen <dag.o.johansen@gmail.com>
Dag Øystein Johansen <dag.o.johansen@gmail.com> wrote: >Result for check.htm: "Sorry, I am unable to validate this document >because on line 1022 it contained one or more bytes that I cannot >interpret as utf-8 (in other words, the bytes found are not valid values >in the specified Character Encoding)." This is an instance of a known bug in the Validator (actually, it's demonstration two separate symptoms of the same bug). To recreate: * Validate the page in question. * Save the validation resulst page to a file. * Upload the file to the validator. In the original page there is a markup error close to the text that reads «med et snitt på 22 minutter per kamp» (line 499). One symptom of the bug in question is that the markup error is reported at an incorrect character offset; it's reported as being close to the word “på”, but should have been reported earlier. The other symptom is that that validator tries to indicate the position of the error to be within the multi-byte sequence comprising the character “å” in “på” above. Since it inserts markup between the characters — actually between the bytes of the multi-byte sequence comprising a single character — the resulting page will contain an invalid multi-byte sequence. These are both symptoms of the validator internally converting documents to UTF-8, but operating with byte semantics instead of character semantics. This bug should be fixed in the next major revision (whenever we switch to using character semantics). This may or may not explain the errors you originally spotted. Thanks for your feedback on this! -- “It's not the mere technical details of inserting the live round into the chamber, pointing the weapon at one's foot, and pulling the trigger, but rather, it's about the advisability of doing that in the first place.” -- Alan J. Flavell on ciwah
Received on Tuesday, 29 November 2005 14:50:13 UTC