Re: Non-us-ascii characters in Tidy Beta version 15-Jan-2004

* Harold Baughan [RockSolidSite.com] wrote:
>On Feb. 9 I asked several questions re: Tidy Beta version 15-Jan-2004, and
>received some good help in operation.  However, one error keeps getting
>introduced.  Note... the version in use is a plug-in to Chami's HTML-Kit
>Version 1.0, Build 292, on Win-98.

It'd be best if you try the command line application to reproduce the
problem with an ideally simple test case. If you are able to reproduce
it, send a message to the list or file a bug report on the sf.net site.
I cannot fix bugs I cannot reproduce.

>After a beautify function, a non-visible, non-us-ascii character is being
>added somewhere in several strings.  It seems to be happening on a line
>which includes <br /> by itself, or a space after </a> at the end of a line,
>or when there are two spaces between a period and the beginning of the next
>sentence (such as in "...word.  Word...")

By default, Tidy does not generate non-ascii output unless there are
non-ascii characters inside constructs where it cannot use character
references (comments, for example); in this case the characters come
out garbled. So there must be some configuration option active,
-latin1 for example. If the character is not visible it is most likely
U+00A0 (&nbsp;). Tidy would insert them e.g. if there is a <nobr>
element in the source document.

>Does anyone know how to make whatever this character visible so that it can
>be edited out?  And, of course, it should be looked into for the next build.

A hex editor would probably work best, if you don't have one yet, there
are lots freely available. Certain viewer applications might also help,
e.g. the Total Commander file manager supports hex view. You could also
put the file online and I'll have a look.

Received on Tuesday, 17 February 2004 17:25:17 UTC