W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2007

Re: libtidy-php: does not clean invalid character refs

From: Fred Bone <Fred.Bone@dial.pipex.com>
Date: Tue, 30 Jan 2007 11:30:41 -0000
To: html-tidy@w3.org
Message-ID: <45BF2C61.5209.500C527E@Fred.Bone.dial.pipex.com>

On 30 Jan 2007 at 11:40, Felix Natter said:

[...]
> I uploaded the problematic file here:
> http://www2.inf.fh-brs.de/~fnatte2s/Adenauer.html

Are you quite sure you are telling Tidy that the file is in utf-8?

When I view the file in Opera, it defaults to interpreting the codepoints 
in iso-8859-1 (and of course displays incorrect glyphs in various 
places). The server is not specifying a charset in the HTTP headers, and 
the file contains no charset information.

Also, the file as delivered by the server is only 253 lines long, and 
your original message included errors referring to lines from 1567 
onwards. Please clarify.

If I run command-line Tidy on the file, specifying only the -utf8 option, 
it warns that six IDs are using XML ID syntax, but does not find any 
character problems.
Received on Tuesday, 30 January 2007 11:31:20 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:56 GMT