- From: Andrea Sparling <asparling@adrelevance.com>
- Date: Tue, 06 Jun 2000 09:13:00 -0700
- To: html-tidy@w3.org
The stack pointer was decremented to 0 before the insertedToken call.
This does
not explain why. But if you change
// this will only be null if inode != null
if (this.insert == -1) {
node = this.inode;
this.inode = null;
return node;
to
// this will only be null if inode != null
if ((this.insert == -1) || ( this.istack.size() < 1) {
node = this.inode;
this.inode = null;
return node;
This may help.
Further inspection of why this code decrements the stack pointer is in
order.
http://www.dlib.org/dlib/september98/millman/09millman.htm
>
> From: Donna Bergmark (bergmark@CS.Cornell.EDU)
> Date: Mon, Jun 05 2000
>
> *Next message: Jelks Cabaniss: "W3C validator (was: Strict tables)"
>
> * Previous message: Bertilo Wennergren: "Strict tables"
> * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Other mail archives: [this mailing list] [other W3C mailing lists]
> * Mail actions: [ respond to this message ] [ mail a new topic ]
>
> ------------------------------------------------------------------------
>
> Message-Id: <200006051808.OAA07367@elgin.cs.cornell.edu>
> To: html-tidy@w3.org
> Date: Mon, 05 Jun 2000 14:08:33 -0400
> From: Donna Bergmark <bergmark@CS.Cornell.EDU>
> Subject: [Tidy/Jtidy bug report]ArrayIndexOutOfBoundsException
>
> There still seems to be a problem with ArrayIndexOutOfBounds.
> I am running Tidy Version 30 April 2000 and JTidy Version 3 June.
> A variant of the Tidy Java Bean example was used. Here is the
> URL that won't parse:
> http://www.dlib.org/dlib/september98/millman/09millman.html
>
> Here is the typescript of the run:
> Script started on Mon Jun 5 13:44:12 2000
> ------------------------------------------------------------------
> Latest version of Tidy, JTidy still bombs on ArrayIndexOutOfBounds
> ------------------------------------------------------------------
> (1) Put 3 June 2000 JTidy into my path
>
> DHCP211-162.CS.CORNELL.EDU% setenv CLASSPATH \
> ? /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
>
> DHCP211-162.CS.CORNELL.EDU% echo $CLASSPATH
> /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
>
> (2) Get original version of the JTidy bean (copied from the Java HTML Tidy
> document). Compile it.
>
> DHCP211-162.CS.CORNELL.EDU% co -r1.1 Test16.java
> RCS/Test16.java,v --> Test16.java
> revision 1.1
> writable Test16.java exists; remove it? [ny](n): y
> done
>
> DHCP211-162.CS.CORNELL.EDU% javac Test16.java
>
> (3) Run it on the URL that causes Tidy/JTidy to crash
>
> DHCP211-162.CS.CORNELL.EDU% cat millman.notidy
> http://www.dlib.org/dlib/september98/millman/09millman.html
>
> DHCP211-162.CS.CORNELL.EDU% java Test16 \
> ? http://www.dlib.org/dlib/september98/millman/09millman.html \
> ? out error
> java.lang.ArrayIndexOutOfBoundsException: 0 >= 0
> at java.util.Vector.elementAt(Vector.java:405)
> at org.w3c.tidy.Lexer.insertedToken(Lexer.java:2738)
> at org.w3c.tidy.Lexer.getToken(Lexer.java:1185)
> at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:1672)
> at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
> at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
> at org.w3c.tidy.ParserImpl$ParseDefList.parse(ParserImpl.java:1438)
> at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
> at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
> at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
> at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
> at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
> at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
> at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
> at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
> at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:652)
> at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
> at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
> at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:258)
> at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:2917)
> at org.w3c.tidy.Tidy.parse(Tidy.java:1055)
> at Test16.run(Test16.java:50)
> at java.lang.Thread.run(Thread.java:475)
>
> (4) Here is what was written into the output files
>
> DHCP211-162.CS.CORNELL.EDU% cat error
>
> Tidy (vers 30th April 2000) Parsing "InputStream"
> line 10 column 61 - Warning: discarding unexpected </a>
> line 16 column 23 - Warning: missing </font> before <h3>
> line 16 column 27 - Warning: inserting implicit <font>
> line 20 column 1 - Warning: inserting implicit <font>
> line 23 column 2 - Warning: missing </font> before <h6>
> line 23 column 5 - Warning: inserting implicit <font>
> line 26 column 1 - Warning: inserting implicit <font>
> line 29 column 23 - Warning: missing </font> before <h3>
> line 29 column 23 - Warning: missing </font> before <h3>
> line 29 column 27 - Warning: inserting implicit <font>
> line 29 column 27 - Warning: inserting implicit <font>
> line 30 column 1 - Warning: discarding unexpected </font>
> line 32 column 1 - Warning: inserting implicit <font>
> line 34 column 1 - Warning: missing </font> before <p>
> line 35 column 1 - Warning: inserting implicit <font>
> line 41 column 53 - Warning: discarding unexpected </i>
> line 43 column 2 - Warning: missing </i> before <p>
> line 43 column 2 - Warning: missing </font> before <p>
> line 43 column 2 - Warning: missing </font> before <p>
> line 47 column 3 - Warning: <img> lacks "alt" attribute
> line 57 column 2 - Warning: missing </font> before <p>
> line 57 column 2 - Warning: missing </h3> before <p>
> line 58 column 1 - Warning: inserting implicit <font>
> line 60 column 1 - Warning: discarding unexpected </h3>
> line 60 column 6 - Warning: discarding unexpected </font>
> line 60 column 13 - Warning: discarding unexpected </font>
> line 60 column 20 - Warning: discarding unexpected </font>
> line 72 column 1 - Warning: trimming empty <p>
> line 72 column 55 - Warning: missing </font> before </h3>
> line 72 column 60 - Warning: discarding unexpected </font>
> line 72 column 67 - Warning: replacing element</p> by <br>
> line 72 column 67 - Warning: inserting implicit <br>
> line 99 column 1 - Warning: trimming empty <p>
> line 99 column 45 - Warning: missing </font> before </h3>
> line 99 column 50 - Warning: discarding unexpected </font>
> line 99 column 57 - Warning: replacing element</p> by <br>
> line 99 column 57 - Warning: inserting implicit <br>
> line 105 column 281 - Warning: trimming empty <p>
> line 106 column 646 - Warning: trimming empty <p>
> line 107 column 129 - Warning: trimming empty <p>
> line 108 column 356 - Warning: trimming empty <p>
> line 109 column 115 - Warning: trimming empty <p>
> line 115 column 5 - Warning: missing </em> before <dl>
> line 115 column 5 - Warning: trimming empty <em>
> line 116 column 5 - Warning: inserting implicit <em>
> line 117 column 1 - Warning: missing <dd>
> line 117 column 1 - Warning: discarding unexpected </em>
>
> DHCP211-162.CS.CORNELL.EDU% cat out
> DHCP211-162.CS.CORNELL.EDU%
> DHCP211-162.CS.CORNELL.EDU% exit
> exit
>
> Script done on Mon Jun 5 13:46:39 2000
> ..............................................................
> Here is the java code that invoked the parse:
> // bergmark - may 2000 - Code example of how to use the Tidy Java Bean
>
> // Code copied from Java HTML Tidy (13 May 2000) document
>
> // CLASSPATH: must include path to JTidy:
> // /home/bergmark/public/src/tools/JTidy/src/30apr2000
>
> import java.io.IOException;
> import java.net.URL;
> import java.io.BufferedInputStream;
> import java.io.FileOutputStream;
> import java.io.PrintWriter;
> import java.io.FileWriter;
> import org.w3c.tidy.Tidy;
>
> /**
> * This program shows how HTML could be tidied directly from
> * a URL stream, and running on separate threads. Note the use
> * of the 'parse' method to parse from an InputStream, and send
> * the pretty-printed result to an OutputStream.
> * In this example thread th1 outputs XML, and thread th2 outputs
> * HTML. This shows that properties are per instance of Tidy.
> */
>
> public class Test16 implements Runnable {
>
> private String url;
> private String outFileName;
> private String errOutFileName;
>
> public Test16(String url, String outFileName,
> String errOutFileName) {
> this.url = url;
> this.outFileName = outFileName;
> this.errOutFileName = errOutFileName;
> }
>
> public void run() {
> URL u;
> BufferedInputStream in;
> FileOutputStream out;
> Tidy tidy = new Tidy();
>
> tidy.setXHTML(true);
> try {
> tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
> u = new URL(url);
> in = new BufferedInputStream(u.openStream());
> out = new FileOutputStream(outFileName);
> tidy.parse(in, out);
> } catch ( IOException e ) {
> System.out.println ( this.toString() + e.toString() );
> }
> }
>
> public static void main( String[] args ) {
> Test16 t = new Test16(args[0], args[1], args[2] );
> Thread th1 = new Thread(t);
> th1.start();
> }
> }
>
> ------------------------------------------------------------------------
>
> * Next message: Jelks Cabaniss: "W3C validator (was: Strict tables)"
> * Previous message: Bertilo Wennergren: "Strict tables"
> * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> * Other mail archives: [this mailing list] [other W3C mailing lists]
> * Mail actions: [ respond to this message ] [ mail a new topic ]
--
Andrea Sparling
206.576.3557
AdRelevance, a division of Media Metrix, Inc.
Received on Tuesday, 6 June 2000 12:13:37 UTC