Re: Tidy outputs no HTML-Entities

At 08:54 PM 2/15/2005 +0100, Bjoern Hoehrmann wrote:
>* Eric Bleinagel wrote:
> >I found out that the application uses the TidyATL.dll, in
> >http://users.rcn.com/creitzel/tidy.html#comatl Charles Reitzel says
> >something about problems in character encoding... didn't really understand
> >it, my English is not the best and I'm not really a programmer, could this
> >be the answer?
>
>Charles, could you take a look at this? The original message is at
>http://lists.w3.org/Archives/Public/html-tidy/2005JanMar/0031.html

I second your comments here.  It depends on if they are parsing a buffer or 
a file.  When parsing a buffer in a COM component, the character encoding 
is always UTF-16.  TidyAtl forces the encoding to UTF-16, otherwise it 
would not work.  Tidy will not emit entities for this output encoding.


> >OK, Tidy cannot recognize the meta-tag and assumes I want ascii, in the
> >config-file I tell him again to put out ascii - and in the output Tidy
> >writes in the meta-tag that he used ascii and not Latin1(iso8859-1) - but
> >then inside the body-tag there has to be a 'ü ü' and not a 'ü ü' ?
> >
> >So this constellation could not be correct, or did I misunderstood
> >something?
>
>I agree, that seems to be a bug, though it's probably a bug in
>Charles' wrapper or the software that uses the wrapper, not in
>Tidy.

Not a bug so much as a limitation of TidyLib.  It has no ability to 
preserve entities in the input.  It will only output character entities for 
the ASCII output encoding.

However, Eric should contact the CMS vendor to verify the settings they use.

hth,
Charlie

Received on Wednesday, 16 February 2005 03:33:41 UTC