- From: olivier Thereaux <ot@w3.org>
- Date: Fri, 23 Apr 2004 08:01:22 +0900
- To: validators community <www-validator@w3.org>
- Cc: Martin Duerst <duerst@w3.org>
- Message-Id: <F985656A-94B0-11D8-9612-000393A63FC8@w3.org>
While fixing minor validator's validity bugs, I noticed this interesting one. Typical test case: validating the validation output for a shift_jis encoded page (in my case, the google.co.jp homepage) Symptom: in its error output, the validator quotes part of the source for the validated page. relevant check code: [[ ... print qq{<span class="msg">$msg</span></p>}; print qq(<p><code class="input">$line</code></p>); ... ]] $line appears to be a truncated part of the validated markup source, which is fine unless the truncating botches up the first characher, as shown here: [[ <p><code class="input">...Éñ</font></b>&nbsp;&nbsp;& nbsp;&nbsp;<strong title="Position where error was detected."><</strong>a id=1a class=q href="/imghp?hl=ja&tab=</code></p> ]] on the last one, as shown here: [[ <p><code class="input">...;&nbsp;<a id=1a class=q href="/imghp?<strong title="Position where error was detected.">h</strong>l=ja&tab=wi&ie=UTF-8&oe=Shift_JIS" >„ǧ„</code></p> ]] I am far from being an expert on that part of the code, but it seems like a typical i18n problem. I am copying Martin, who helped a lot in the past in charset detection and transcoding. Martin, any idea what's going on here and how to fix this? -- olivier
Received on Thursday, 22 April 2004 19:02:00 UTC