W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2004

Re: Non-us-ascii characters in Tidy Beta version 15-Jan-2004

From: Harold Baughan [RockSolidSite.com] <hbaughan@rocksolidsite.com>
Date: Tue, 17 Feb 2004 23:56:57 -0500
Message-ID: <000901c3f5dc$7aebf480$a02e4b43@e3f3y7>
To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
Cc: <html-tidy@w3.org>

Hello Bjoern,

> By default, Tidy does not generate non-ascii output unless there are
> non-ascii characters inside constructs where it cannot use character
> references (comments, for example); in this case the characters come
> out garbled. So there must be some configuration option active,
> -latin1 for example. If the character is not visible it is most likely
> U+00A0 (&nbsp;). Tidy would insert them e.g. if there is a <nobr>
> element in the source document.

Give the man a cigar!

I have not been able to find a <nobr> element in the source text.  However,
I obtained a copy of the Frhed Hex editor and through it's eyes saw the evil
little creature in the final text.  An a0.  When I copy it in context from
the hex editor into this email, it comes out as follows --
"So, was it John Curtis?<bh:a0> Or perhaps the namesake of "

But now I've bounced around so much and tried so many things that I cannot
specifically remember/prove that the *only* source could have been Tidy.
(There were two spaces after the punctuation in the original text.)  So it's
time to backtrack and re-investigate. <sigh>

Harold
Received on Wednesday, 18 February 2004 00:05:19 UTC

This archive was generated by hypermail 2.3.1 : Wednesday, 5 February 2014 07:15:53 UTC