While fixing minor validator's validity bugs, I noticed this interesting one. Typical test case: validating the validation output for a shift_jis encoded page (in my case, the google.co.jp homepage) Symptom: in its error output, the validator quotes part of the source for the validated page. relevant check code: [[ ... print qq{<span class="msg">$msg</span></p>}; print qq(<p><code class="input">$line</code></p>); ... ]] $line appears to be a truncated part of the validated markup source, which is fine unless the truncating botches up the first characher, as shown here: [[ <p><code class="input">...Éñ</font></b>&nbsp;&nbsp;& nbsp;&nbsp;<strong title="Position where error was detected."><</strong>a id=1a class=q href="/imghp?hl=ja&tab=</code></p> ]] on the last one, as shown here: [[ <p><code class="input">...;&nbsp;<a id=1a class=q href="/imghp?<strong title="Position where error was detected.">h</strong>l=ja&tab=wi&ie=UTF-8&oe=Shift_JIS" >„ǧ„</code></p> ]] I am far from being an expert on that part of the code, but it seems like a typical i18n problem. I am copying Martin, who helped a lot in the past in charset detection and transcoding. Martin, any idea what's going on here and how to fix this? -- olivierReceived on Thursday, 22 April 2004 19:02:00 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 1 October 2009 14:48:51 GMT