I've attached a particularly ugly piece of HTML code that I received. You'll notice on line 59, there is a td element with a BACKGROUND= element where the quotes seem to be hosed up. When Tidy is parsing this, it thinks BACKGROUND is the attribute and SRC= is the value of the attribute. So far, so good. The next attribute is images/clearPixel.gif" and there is no value since this is not followed by an equal sign. This causes Tidy to vomit while trying to reference a null pointer due to a null value being passed into Report.attrError when reporting a BAD_ATTRIBUTE_VALUE. My recommended fix for Java Tidy is to change Lexer.parseAttrs at line 2,612 of Lexer.java (8 july 2000 edition) from: Report.attrError(this, this.token, value, Report.BAD_ATTRIBUTE_VALUE); to: if (value == null) Report.attrError(this, this.token, attribute, Report.MISSING_ATTR_VALUE); else Report.attrError(this, this.token, value, Report.BAD_ATTRIBUTE_VALUE); Gary
This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:48 UTC