- From: Daniel Lieuwen <lieuwen@research.bell-labs.com>
- Date: Wed, 11 Apr 2001 12:42:38 -0400
- To: html-tidy@w3.org
Returning the commented out break statement in
if (c == '<')
{
/* UngetChar(c, lexer->in); *
ReportAttrError(lexer, lexer->token, null, UNEXPECTED_GT);
/* break; */
}
from
static char *ParseValue(Lexer *lexer, char *name,^M
Bool foldCase, Bool *isempty, int *pdelim)
in lexer.c
would fix many of the cases where this problem is now occuring.
>
>
> From: Andy.Quick@sybase.com
> To: html-tidy@w3.org
> Date: Tue, 13 Mar 2001 10:34:42 -0500
> Message-ID: <OF74ED3597.22A94C99-ON85256A0E.004E4ADC@sybase.com>
> Subject: Tidy cannot repair tags with missing '>'
>
> I could not find anything about this, so I am posting it.
> In the example below, Tidy cannot repair the document
> because the <font> tag is badly formed - it is missing the
> '>'.
>
> <html>
> <head><title>Sample Problem</title></head>
> <body>
> <p>
> <font size="-2">There seems to be an error occurring when you don't</font>
> <font face="arial,helvetica, geneva" size="-2"<b>end</b> a tag with a >. Tidy won't fix it.</font>
> </p>
> </body>
> </html>
>
> I can propose a possible solution. The attempt was made
> in Java tidy, so I will describe it in terms of Java tidy:
>
> 1. In StreamInImpl, added a small stack to store characters
> so ungetChar can be used to back up more than 1 character.
>
> public int readChar()
> {
> int c;
>
> if (this.pushed)
> {
> this.prevPos--;
> if( this.prevPos == 0 ) this.pushed = false;
> c = this.previous[ this.prevPos ];
>
> if (c == '\n')
> {
> this.curcol = 1;
> this.curline++;
> return c;
> }
>
> this.curcol++;
> return c;
> }
> ....
>
> public void ungetChar(int c)
> {
> this.pushed = true;
> if( this.prevPos == 5 ) this.prevPos = 0; // Reset counter.
> this.previous[ this.prevPos ] = c;
> this.prevPos++;
>
> if (c == '\n')
> {
> --this.curline;
> }
>
> this.curcol = this.lastcol;
> }
>
> New class members:
> protected int[] previous;
> protected int prevPos;
>
> Constructor:
> this.previous = new int[ 5 ]; // allow 5 backup chars
> this.prevPos = 0;
>
> 2. In Lexer, attempted to recover from unexpected '<'s
> in parseAttribute and parseValue.
>
> line 2179
> this.in.ungetChar(c);
> /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */
> c = '<';
> this.in.ungetChar(c);
> return null;
>
> line 2445
> /* this.in.ungetChar(c); */
> /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */
> this.in.ungetChar(c);
> c = '>';
> this.in.ungetChar(c);
> c = lastc;
> continue;
> /* break; */
>
> Regards,
>
> Andy Quick
>
> ------------------------------------------------------------------------
>
> * Next message: J. David Bryan: "RE: Using strict doctype"
> * Previous message: Pim van Arend: "Re: tidy without indenting?"
> * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Other mail archives: [this mailing list] [other W3C mailing lists]
> * Mail actions: [ respond to this message ] [ mail a new topic ]
Received on Wednesday, 11 April 2001 12:42:03 UTC