W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2007

Re: libtidy-php: does not clean invalid character refs

From: Felix Natter <felix.natter@smail.inf.fh-bonn-rhein-sieg.de>
Date: Tue, 30 Jan 2007 12:49:53 +0100
To: Fred.Bone@dial.pipex.com
Cc: html-tidy@w3.org
Message-Id: <1170157793.5234.33.camel@localhost.localdomain>

On Tue, 2007-01-30 at 11:30 +0000, Fred Bone wrote:
> On 30 Jan 2007 at 11:40, Felix Natter said:
> 
> [...]
> > I uploaded the problematic file here:
> > http://www2.inf.fh-brs.de/~fnatte2s/Adenauer.html
> 
> Are you quite sure you are telling Tidy that the file is in utf-8?

file tells this and the file is displayed correctly when I choose
utf8 in Firefox.

> When I view the file in Opera, it defaults to interpreting the codepoints 
> in iso-8859-1 (and of course displays incorrect glyphs in various 
> places). The server is not specifying a charset in the HTTP headers, and 
> the file contains no charset information.

It's a static file as emitted from Mediawiki parser. I only uploaded it
manually.

> Also, the file as delivered by the server is only 253 lines long, and 
> your original message included errors referring to lines from 1567 
> onwards. Please clarify.

Maybe I used a different file in my original post:

line 13 column 264 - Warning: replacing invalid character code 128
line 14 column 47 - Warning: replacing invalid character code 159
line 15 column 431 - Warning: replacing invalid character code 159
line 15 column 455 - Warning: replacing invalid character code 159
line 15 column 615 - Warning: replacing invalid character code 159
line 52 column 178 - Warning: replacing invalid character code 128
line 52 column 179 - Warning: replacing invalid character code 147
line 52 column 241 - Warning: replacing invalid character code 128
line 52 column 242 - Warning: replacing invalid character code 147
line 52 column 335 - Warning: replacing invalid character code 128
line 52 column 336 - Warning: replacing invalid character code 147
line 52 column 359 - Warning: replacing invalid character code 128
line 52 column 360 - Warning: replacing invalid character code 147
line 52 column 380 - Warning: replacing invalid character code 128
line 52 column 381 - Warning: replacing invalid character code 147
line 57 column 1074 - Warning: replacing invalid character code 159
line 61 column 237 - Warning: replacing invalid character code 159
line 61 column 272 - Warning: replacing invalid character code 128


> If I run command-line Tidy on the file, specifying only the -utf8 option, 
> it warns that six IDs are using XML ID syntax, but does not find any 
> character problems.

Okay, I didn't know that I need to specify -utf8, maybe that is all I
need. I'll have to check with a coworker (he told me that some
unspecified problem had been fixed with tidy).

thank you!
-- 
Felix Natter <felix.natter@smail.inf.fh-brs.de>
Received on Tuesday, 30 January 2007 11:49:17 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT