- From: Donna Bergmark <bergmark@CS.Cornell.EDU>
- Date: Mon, 05 Jun 2000 14:08:33 -0400
- To: html-tidy@w3.org
There still seems to be a problem with ArrayIndexOutOfBounds. I am running Tidy Version 30 April 2000 and JTidy Version 3 June. A variant of the Tidy Java Bean example was used. Here is the URL that won't parse: http://www.dlib.org/dlib/september98/millman/09millman.html Here is the typescript of the run: Script started on Mon Jun 5 13:44:12 2000 ------------------------------------------------------------------ Latest version of Tidy, JTidy still bombs on ArrayIndexOutOfBounds ------------------------------------------------------------------ (1) Put 3 June 2000 JTidy into my path DHCP211-162.CS.CORNELL.EDU% setenv CLASSPATH \ ? /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:. DHCP211-162.CS.CORNELL.EDU% echo $CLASSPATH /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:. (2) Get original version of the JTidy bean (copied from the Java HTML Tidy document). Compile it. DHCP211-162.CS.CORNELL.EDU% co -r1.1 Test16.java RCS/Test16.java,v --> Test16.java revision 1.1 writable Test16.java exists; remove it? [ny](n): y done DHCP211-162.CS.CORNELL.EDU% javac Test16.java (3) Run it on the URL that causes Tidy/JTidy to crash DHCP211-162.CS.CORNELL.EDU% cat millman.notidy http://www.dlib.org/dlib/september98/millman/09millman.html DHCP211-162.CS.CORNELL.EDU% java Test16 \ ? http://www.dlib.org/dlib/september98/millman/09millman.html \ ? out error java.lang.ArrayIndexOutOfBoundsException: 0 >= 0 at java.util.Vector.elementAt(Vector.java:405) at org.w3c.tidy.Lexer.insertedToken(Lexer.java:2738) at org.w3c.tidy.Lexer.getToken(Lexer.java:1185) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:1672) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49) at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36) at org.w3c.tidy.ParserImpl$ParseDefList.parse(ParserImpl.java:1438) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49) at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49) at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36) at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49) at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36) at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:652) at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49) at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36) at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:258) at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:2917) at org.w3c.tidy.Tidy.parse(Tidy.java:1055) at Test16.run(Test16.java:50) at java.lang.Thread.run(Thread.java:475) (4) Here is what was written into the output files DHCP211-162.CS.CORNELL.EDU% cat error Tidy (vers 30th April 2000) Parsing "InputStream" line 10 column 61 - Warning: discarding unexpected </a> line 16 column 23 - Warning: missing </font> before <h3> line 16 column 27 - Warning: inserting implicit <font> line 20 column 1 - Warning: inserting implicit <font> line 23 column 2 - Warning: missing </font> before <h6> line 23 column 5 - Warning: inserting implicit <font> line 26 column 1 - Warning: inserting implicit <font> line 29 column 23 - Warning: missing </font> before <h3> line 29 column 23 - Warning: missing </font> before <h3> line 29 column 27 - Warning: inserting implicit <font> line 29 column 27 - Warning: inserting implicit <font> line 30 column 1 - Warning: discarding unexpected </font> line 32 column 1 - Warning: inserting implicit <font> line 34 column 1 - Warning: missing </font> before <p> line 35 column 1 - Warning: inserting implicit <font> line 41 column 53 - Warning: discarding unexpected </i> line 43 column 2 - Warning: missing </i> before <p> line 43 column 2 - Warning: missing </font> before <p> line 43 column 2 - Warning: missing </font> before <p> line 47 column 3 - Warning: <img> lacks "alt" attribute line 57 column 2 - Warning: missing </font> before <p> line 57 column 2 - Warning: missing </h3> before <p> line 58 column 1 - Warning: inserting implicit <font> line 60 column 1 - Warning: discarding unexpected </h3> line 60 column 6 - Warning: discarding unexpected </font> line 60 column 13 - Warning: discarding unexpected </font> line 60 column 20 - Warning: discarding unexpected </font> line 72 column 1 - Warning: trimming empty <p> line 72 column 55 - Warning: missing </font> before </h3> line 72 column 60 - Warning: discarding unexpected </font> line 72 column 67 - Warning: replacing element</p> by <br> line 72 column 67 - Warning: inserting implicit <br> line 99 column 1 - Warning: trimming empty <p> line 99 column 45 - Warning: missing </font> before </h3> line 99 column 50 - Warning: discarding unexpected </font> line 99 column 57 - Warning: replacing element</p> by <br> line 99 column 57 - Warning: inserting implicit <br> line 105 column 281 - Warning: trimming empty <p> line 106 column 646 - Warning: trimming empty <p> line 107 column 129 - Warning: trimming empty <p> line 108 column 356 - Warning: trimming empty <p> line 109 column 115 - Warning: trimming empty <p> line 115 column 5 - Warning: missing </em> before <dl> line 115 column 5 - Warning: trimming empty <em> line 116 column 5 - Warning: inserting implicit <em> line 117 column 1 - Warning: missing <dd> line 117 column 1 - Warning: discarding unexpected </em> DHCP211-162.CS.CORNELL.EDU% cat out DHCP211-162.CS.CORNELL.EDU% DHCP211-162.CS.CORNELL.EDU% exit exit Script done on Mon Jun 5 13:46:39 2000 .............................................................. Here is the java code that invoked the parse: // bergmark - may 2000 - Code example of how to use the Tidy Java Bean // Code copied from Java HTML Tidy (13 May 2000) document // CLASSPATH: must include path to JTidy: // /home/bergmark/public/src/tools/JTidy/src/30apr2000 import java.io.IOException; import java.net.URL; import java.io.BufferedInputStream; import java.io.FileOutputStream; import java.io.PrintWriter; import java.io.FileWriter; import org.w3c.tidy.Tidy; /** * This program shows how HTML could be tidied directly from * a URL stream, and running on separate threads. Note the use * of the 'parse' method to parse from an InputStream, and send * the pretty-printed result to an OutputStream. * In this example thread th1 outputs XML, and thread th2 outputs * HTML. This shows that properties are per instance of Tidy. */ public class Test16 implements Runnable { private String url; private String outFileName; private String errOutFileName; public Test16(String url, String outFileName, String errOutFileName) { this.url = url; this.outFileName = outFileName; this.errOutFileName = errOutFileName; } public void run() { URL u; BufferedInputStream in; FileOutputStream out; Tidy tidy = new Tidy(); tidy.setXHTML(true); try { tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true)); u = new URL(url); in = new BufferedInputStream(u.openStream()); out = new FileOutputStream(outFileName); tidy.parse(in, out); } catch ( IOException e ) { System.out.println ( this.toString() + e.toString() ); } } public static void main( String[] args ) { Test16 t = new Test16(args[0], args[1], args[2] ); Thread th1 = new Thread(t); th1.start(); } }
Received on Monday, 5 June 2000 14:08:35 UTC