- From: Daniel Lieuwen <lieuwen@research.bell-labs.com>
- Date: Wed, 11 Apr 2001 12:42:38 -0400
- To: html-tidy@w3.org
Returning the commented out break statement in if (c == '<') { /* UngetChar(c, lexer->in); * ReportAttrError(lexer, lexer->token, null, UNEXPECTED_GT); /* break; */ } from static char *ParseValue(Lexer *lexer, char *name,^M Bool foldCase, Bool *isempty, int *pdelim) in lexer.c would fix many of the cases where this problem is now occuring. > > > From: Andy.Quick@sybase.com > To: html-tidy@w3.org > Date: Tue, 13 Mar 2001 10:34:42 -0500 > Message-ID: <OF74ED3597.22A94C99-ON85256A0E.004E4ADC@sybase.com> > Subject: Tidy cannot repair tags with missing '>' > > I could not find anything about this, so I am posting it. > In the example below, Tidy cannot repair the document > because the <font> tag is badly formed - it is missing the > '>'. > > <html> > <head><title>Sample Problem</title></head> > <body> > <p> > <font size="-2">There seems to be an error occurring when you don't</font> > <font face="arial,helvetica, geneva" size="-2"<b>end</b> a tag with a >. Tidy won't fix it.</font> > </p> > </body> > </html> > > I can propose a possible solution. The attempt was made > in Java tidy, so I will describe it in terms of Java tidy: > > 1. In StreamInImpl, added a small stack to store characters > so ungetChar can be used to back up more than 1 character. > > public int readChar() > { > int c; > > if (this.pushed) > { > this.prevPos--; > if( this.prevPos == 0 ) this.pushed = false; > c = this.previous[ this.prevPos ]; > > if (c == '\n') > { > this.curcol = 1; > this.curline++; > return c; > } > > this.curcol++; > return c; > } > .... > > public void ungetChar(int c) > { > this.pushed = true; > if( this.prevPos == 5 ) this.prevPos = 0; // Reset counter. > this.previous[ this.prevPos ] = c; > this.prevPos++; > > if (c == '\n') > { > --this.curline; > } > > this.curcol = this.lastcol; > } > > New class members: > protected int[] previous; > protected int prevPos; > > Constructor: > this.previous = new int[ 5 ]; // allow 5 backup chars > this.prevPos = 0; > > 2. In Lexer, attempted to recover from unexpected '<'s > in parseAttribute and parseValue. > > line 2179 > this.in.ungetChar(c); > /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */ > c = '<'; > this.in.ungetChar(c); > return null; > > line 2445 > /* this.in.ungetChar(c); */ > /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */ > this.in.ungetChar(c); > c = '>'; > this.in.ungetChar(c); > c = lastc; > continue; > /* break; */ > > Regards, > > Andy Quick > > ------------------------------------------------------------------------ > > * Next message: J. David Bryan: "RE: Using strict doctype" > * Previous message: Pim van Arend: "Re: tidy without indenting?" > * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] > * Other mail archives: [this mailing list] [other W3C mailing lists] > * Mail actions: [ respond to this message ] [ mail a new topic ]
Received on Wednesday, 11 April 2001 12:42:03 UTC