- From: Donna Bergmark <bergmark@CS.Cornell.EDU>
- Date: Mon, 05 Jun 2000 14:08:33 -0400
- To: html-tidy@w3.org
There still seems to be a problem with ArrayIndexOutOfBounds.
I am running Tidy Version 30 April 2000 and JTidy Version 3 June.
A variant of the Tidy Java Bean example was used. Here is the
URL that won't parse:
http://www.dlib.org/dlib/september98/millman/09millman.html
Here is the typescript of the run:
Script started on Mon Jun 5 13:44:12 2000
------------------------------------------------------------------
Latest version of Tidy, JTidy still bombs on ArrayIndexOutOfBounds
------------------------------------------------------------------
(1) Put 3 June 2000 JTidy into my path
DHCP211-162.CS.CORNELL.EDU% setenv CLASSPATH \
? /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
DHCP211-162.CS.CORNELL.EDU% echo $CLASSPATH
/home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
(2) Get original version of the JTidy bean (copied from the Java HTML Tidy
document). Compile it.
DHCP211-162.CS.CORNELL.EDU% co -r1.1 Test16.java
RCS/Test16.java,v --> Test16.java
revision 1.1
writable Test16.java exists; remove it? [ny](n): y
done
DHCP211-162.CS.CORNELL.EDU% javac Test16.java
(3) Run it on the URL that causes Tidy/JTidy to crash
DHCP211-162.CS.CORNELL.EDU% cat millman.notidy
http://www.dlib.org/dlib/september98/millman/09millman.html
DHCP211-162.CS.CORNELL.EDU% java Test16 \
? http://www.dlib.org/dlib/september98/millman/09millman.html \
? out error
java.lang.ArrayIndexOutOfBoundsException: 0 >= 0
at java.util.Vector.elementAt(Vector.java:405)
at org.w3c.tidy.Lexer.insertedToken(Lexer.java:2738)
at org.w3c.tidy.Lexer.getToken(Lexer.java:1185)
at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:1672)
at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
at org.w3c.tidy.ParserImpl$ParseDefList.parse(ParserImpl.java:1438)
at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:652)
at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:258)
at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:2917)
at org.w3c.tidy.Tidy.parse(Tidy.java:1055)
at Test16.run(Test16.java:50)
at java.lang.Thread.run(Thread.java:475)
(4) Here is what was written into the output files
DHCP211-162.CS.CORNELL.EDU% cat error
Tidy (vers 30th April 2000) Parsing "InputStream"
line 10 column 61 - Warning: discarding unexpected </a>
line 16 column 23 - Warning: missing </font> before <h3>
line 16 column 27 - Warning: inserting implicit <font>
line 20 column 1 - Warning: inserting implicit <font>
line 23 column 2 - Warning: missing </font> before <h6>
line 23 column 5 - Warning: inserting implicit <font>
line 26 column 1 - Warning: inserting implicit <font>
line 29 column 23 - Warning: missing </font> before <h3>
line 29 column 23 - Warning: missing </font> before <h3>
line 29 column 27 - Warning: inserting implicit <font>
line 29 column 27 - Warning: inserting implicit <font>
line 30 column 1 - Warning: discarding unexpected </font>
line 32 column 1 - Warning: inserting implicit <font>
line 34 column 1 - Warning: missing </font> before <p>
line 35 column 1 - Warning: inserting implicit <font>
line 41 column 53 - Warning: discarding unexpected </i>
line 43 column 2 - Warning: missing </i> before <p>
line 43 column 2 - Warning: missing </font> before <p>
line 43 column 2 - Warning: missing </font> before <p>
line 47 column 3 - Warning: <img> lacks "alt" attribute
line 57 column 2 - Warning: missing </font> before <p>
line 57 column 2 - Warning: missing </h3> before <p>
line 58 column 1 - Warning: inserting implicit <font>
line 60 column 1 - Warning: discarding unexpected </h3>
line 60 column 6 - Warning: discarding unexpected </font>
line 60 column 13 - Warning: discarding unexpected </font>
line 60 column 20 - Warning: discarding unexpected </font>
line 72 column 1 - Warning: trimming empty <p>
line 72 column 55 - Warning: missing </font> before </h3>
line 72 column 60 - Warning: discarding unexpected </font>
line 72 column 67 - Warning: replacing element</p> by <br>
line 72 column 67 - Warning: inserting implicit <br>
line 99 column 1 - Warning: trimming empty <p>
line 99 column 45 - Warning: missing </font> before </h3>
line 99 column 50 - Warning: discarding unexpected </font>
line 99 column 57 - Warning: replacing element</p> by <br>
line 99 column 57 - Warning: inserting implicit <br>
line 105 column 281 - Warning: trimming empty <p>
line 106 column 646 - Warning: trimming empty <p>
line 107 column 129 - Warning: trimming empty <p>
line 108 column 356 - Warning: trimming empty <p>
line 109 column 115 - Warning: trimming empty <p>
line 115 column 5 - Warning: missing </em> before <dl>
line 115 column 5 - Warning: trimming empty <em>
line 116 column 5 - Warning: inserting implicit <em>
line 117 column 1 - Warning: missing <dd>
line 117 column 1 - Warning: discarding unexpected </em>
DHCP211-162.CS.CORNELL.EDU% cat out
DHCP211-162.CS.CORNELL.EDU%
DHCP211-162.CS.CORNELL.EDU% exit
exit
Script done on Mon Jun 5 13:46:39 2000
..............................................................
Here is the java code that invoked the parse:
// bergmark - may 2000 - Code example of how to use the Tidy Java Bean
// Code copied from Java HTML Tidy (13 May 2000) document
// CLASSPATH: must include path to JTidy:
// /home/bergmark/public/src/tools/JTidy/src/30apr2000
import java.io.IOException;
import java.net.URL;
import java.io.BufferedInputStream;
import java.io.FileOutputStream;
import java.io.PrintWriter;
import java.io.FileWriter;
import org.w3c.tidy.Tidy;
/**
* This program shows how HTML could be tidied directly from
* a URL stream, and running on separate threads. Note the use
* of the 'parse' method to parse from an InputStream, and send
* the pretty-printed result to an OutputStream.
* In this example thread th1 outputs XML, and thread th2 outputs
* HTML. This shows that properties are per instance of Tidy.
*/
public class Test16 implements Runnable {
private String url;
private String outFileName;
private String errOutFileName;
public Test16(String url, String outFileName,
String errOutFileName) {
this.url = url;
this.outFileName = outFileName;
this.errOutFileName = errOutFileName;
}
public void run() {
URL u;
BufferedInputStream in;
FileOutputStream out;
Tidy tidy = new Tidy();
tidy.setXHTML(true);
try {
tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
u = new URL(url);
in = new BufferedInputStream(u.openStream());
out = new FileOutputStream(outFileName);
tidy.parse(in, out);
} catch ( IOException e ) {
System.out.println ( this.toString() + e.toString() );
}
}
public static void main( String[] args ) {
Test16 t = new Test16(args[0], args[1], args[2] );
Thread th1 = new Thread(t);
th1.start();
}
}
Received on Monday, 5 June 2000 14:08:35 UTC