- From: <Andy.Quick@sybase.com>
- Date: Tue, 13 Mar 2001 10:34:42 -0500
- To: html-tidy@w3.org
I could not find anything about this, so I am posting it. In the example below, Tidy cannot repair the document because the <font> tag is badly formed - it is missing the '>'. <html> <head><title>Sample Problem</title></head> <body> <p> <font size="-2">There seems to be an error occurring when you don't</font> <font face="arial,helvetica, geneva" size="-2"<b>end</b> a tag with a >. Tidy won't fix it.</font> </p> </body> </html> I can propose a possible solution. The attempt was made in Java tidy, so I will describe it in terms of Java tidy: 1. In StreamInImpl, added a small stack to store characters so ungetChar can be used to back up more than 1 character. public int readChar() { int c; if (this.pushed) { this.prevPos--; if( this.prevPos == 0 ) this.pushed = false; c = this.previous[ this.prevPos ]; if (c == '\n') { this.curcol = 1; this.curline++; return c; } this.curcol++; return c; } .... public void ungetChar(int c) { this.pushed = true; if( this.prevPos == 5 ) this.prevPos = 0; // Reset counter. this.previous[ this.prevPos ] = c; this.prevPos++; if (c == '\n') { --this.curline; } this.curcol = this.lastcol; } New class members: protected int[] previous; protected int prevPos; Constructor: this.previous = new int[ 5 ]; // allow 5 backup chars this.prevPos = 0; 2. In Lexer, attempted to recover from unexpected '<'s in parseAttribute and parseValue. line 2179 this.in.ungetChar(c); /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */ c = '<'; this.in.ungetChar(c); return null; line 2445 /* this.in.ungetChar(c); */ /* Report.attrError(this, this.token, null, Report.UNEXPECTED_GT); */ this.in.ungetChar(c); c = '>'; this.in.ungetChar(c); c = lastc; continue; /* break; */ Regards, Andy Quick
Received on Tuesday, 13 March 2001 10:34:38 UTC