Tidy Bug(?) -- Reported Column Numbers

Dave Raggett,

Thank you for "Tidy".  I am using it with the editor Kedit for Windows.  I 
have added a Tool Button for Tidy, and a couple of macros to call Tidy, 
display the error file, and place the cursor at the offending point in the 
source file for each error or warning. (Man, I do make those errors!)

In the "Tidy (vers 13th January 2000)", there seems to be a problem in 
reporting the column number when there is a character code for the 
Microsoft Windows fonts in the range 128 - 159.  It is apparently reporting 
the beginning of the "clause" containing the character, rather than the 
character itself.

If this is a "feature" rather than a bug, I'd vote to alter the behavior so 
it points to the offending character.

Thanks for thing about this, and again for Tidy itself.

Here is a sample HTML, excepts from a conversion from Word2000 of a 
document by someone else, and the Tidy report from it.

><!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" 
>"http://www.w3.org/TR/REC-html40/loose.dtd">
><html>
><head>
><meta name=Generator content="Microsoft Office HTML Filter 2.0">
><meta http-equiv=Content-Type content="text/html; charset=windows-1252">
><meta name=Originator content="Microsoft Word 9">
><title>Bad Tidy column numbers, excerpts from (bad) HTML Conversion by 
>Word2000 filter</title>
></head>
><body>
>
><p class=TTYOUT style='margin-right:-.05pt;'>foo.
>= ‘bar’</p>
>
><p class=TTYOUT style='margin-right:-.05pt'>if (symbol(‘foo.’)) then</p>
>
><p class=MsoNormal style='margin-top:0in;margin-right:-.05pt;margin-bottom:
>0in;margin-left:.25in;margin-bottom:.0001pt;text-indent:-.25in;'><span
>style='font-family:Symbol'>·<span style='font:7.0pt "Times New 
>Roman"'>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
></span></span>Whenever the vertical bar <span style='font-family:
>"Courier New";'>|</span> is used in any
>of the syntax diagrams, it means that either the term to the left, or the term
>to the right can be used, but not both, and at least one of the must be
>used.  This “operator<span
>style='font-family:Arial;'>”</span> is
>associative (can be used in sequence), and it has lower priority than the
>square brackets (the scope of the vertical bar located within a pair of square
>brackets or curly braces is limited to the text within those square 
>brackets or
>curly braces.</p>
>
><p class=MsoNormal style='margin-top:0in;margin-right:-.05pt;margin-bottom:
>0in;margin-left:.5in;margin-bottom:.0001pt'>Indicates that the subclause is a
>constant string, i.e. either enclosed by single quotes (’...’) or double 
>quotes
>(”...”).</p>
>
></body>
></html>



>Tidy (vers 13th January 2000) Parsing "bad.html"
>line 11 column 46 - Warning: replacing illegal character code 145
>line 11 column 46 - Warning: replacing illegal character code 146
>line 14 column 45 - Warning: replacing illegal character code 145
>line 14 column 45 - Warning: replacing illegal character code 146
>line 20 column 25 - Warning: replacing illegal character code 147
>line 23 column 23 - Warning: replacing illegal character code 148
>line 31 column 45 - Warning: replacing illegal character code 146
>line 31 column 45 - Warning: replacing illegal character code 146
>line 31 column 45 - Warning: replacing illegal character code 148
>line 31 column 45 - Warning: replacing illegal character code 148
>
>"bad.html" appears to be HTML 4.0 Transitional
>10 warnings/errors were found!





Jeff Hennick
JHennick@Delphi.Com

Received on Sunday, 2 April 2000 03:35:55 UTC