W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2007

Re: libtidy-php: does not clean invalid character refs

From: Krzysztof Gorzelak <krzysztof@uno.pl>
Date: Tue, 30 Jan 2007 10:51:00 +0100
Message-ID: <007201c74454$26cbcaa0$7101a9c0@pologne>
To: "Felix Natter" <felix.natter@smail.inf.fh-bonn-rhein-sieg.de>
Cc: <html-tidy@w3.org>

----- Original Message ----- 
From: "Felix Natter" <felix.natter@smail.inf.fh-bonn-rhein-sieg.de>
To: "Krzysztof Gorzelak" <krzysztof@uno.pl>
Sent: Tuesday, January 30, 2007 10:33 AM
Subject: Re: libtidy-php: does not clean invalid character refs


>> > And the output still contains invalid character references because
>> > this still shows when I run the command-line tidy over the result:
>> >
>> > line 1567 column 31 - Warning: replacing invalid character code 145
>> > line 1573 column 31 - Warning: replacing invalid character code 145
>> > line 1579 column 33 - Warning: replacing invalid character code 145
>> > line 1712 column 45 - Warning: <a> attribute with missing trailing 
>> > quote
>> > mark
>> > line 1771 column 28 - Warning: replacing invalid character code 136
>> >
>> > How can I get libtidy under PHP to fix these "invalid character
>> > messages"?
>> >
>> > I am using libtidy 20050415-1 on debian sarge with php-tidy 1.2 (which
>> > seems to be no longer maintained).
>>
>> I'm using "Multibyte String" library ( function mb_convert_encoding ) to
>> prepare my webpage for tidy cleaning...
>
> Thanks for the reply!
>
> Unfortunately mb_convert_encoding does not eliminate these problems.
> I do this:
> $out = mb_convert_encoding($text, "UTF-8", "UTF-8");
> and still get the same problems in tidy.
>
> Do you have another idea on how to fix this?
>

You can try iconv library (character set conversion facility) or just 
functions utf8_encode() & utf8_decode()... There is also a nice article 
about internationalization: http://www.phpwact.org/php/i18n/charsets

Bonne journee!
Krzysztof Gorzelak
krzysztof@uno.pl
http://www.uno.pl
Received on Tuesday, 30 January 2007 09:51:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT