- From: Martin Duerst <duerst@w3.org>
- Date: Fri, 23 Apr 2004 08:55:10 +0900
- To: olivier Thereaux <ot@w3.org>, validators community <www-validator@w3.org>
Hello Olivier, I'm off today, so I can only give a preliminary answer. By the time the source code gets printed, it should all be converted into UTF-8. But the output looks as if there is some stuff that isn't UTF-8. I'll try to look into this next week. What you may want to do is: - download the page in question - run iconv on it to convert it to UTF-8 - check for full UTF-8 compliance with a regexp like e.g. the one at http://www.w3.org/International/questions/qa-forms-utf-8.html. If something gets caught, then that would be an error in iconv. Regards, Martin. At 08:01 04/04/23 +0900, olivier Thereaux wrote: >While fixing minor validator's validity bugs, I noticed this >interesting one. > >Typical test case: validating the validation output for a shift_jis >encoded page (in my case, the google.co.jp homepage) > >Symptom: in its error output, the validator quotes part of the source >for the validated page. > >relevant check code: > >[[ >... > print qq{<span class="msg">$msg</span></p>}; > print qq(<p><code class="input">$line</code></p>); >... >]] > >$line appears to be a truncated part of the validated markup source, >which is fine unless the truncating botches up the first characher, as >shown here: >[[ ><p><code >class="input">...$BIq(B</font></b>&nbsp;&nbsp;& >nbsp;&nbsp;<strong title="Position where error was >detected."><</strong>a id=1a class=q >href="/imghp?hl=ja&tab=</code></p> >]] >on the last one, as shown here: >[[ ><p><code class="input">...;&nbsp;<a id=1a class=q >href="/imghp?<strong title="Position where error was >detected.">h</strong>l=ja&tab=wi&ie=UTF-8&oe=Shift_JIS" >>$B(I%!'î(B >]] > >I am far from being an expert on that part of the code, but it seems >like a typical i18n problem. I am copying Martin, who helped a lot in >the past in charset detection and transcoding. > >Martin, any idea what's going on here and how to fix this? >-- >olivier >
Received on Thursday, 22 April 2004 19:55:30 UTC