W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2000

Re: [Tidy/Jtidy bug report]ArrayIndexOutOfBoundsException

From: Andrea Sparling <asparling@adrelevance.com>
Date: Tue, 06 Jun 2000 09:13:00 -0700
Message-ID: <393D230C.F2C8D662@adrelevance.com>
To: html-tidy@w3.org
The stack pointer was decremented to 0 before the insertedToken call.
This does
not explain why. But if you change 
        // this will only be null if inode != null
        if (this.insert == -1) {
            node = this.inode;
            this.inode = null;
            return node;
to

        // this will only be null if inode != null
        if ((this.insert == -1) || ( this.istack.size() < 1) {
            node = this.inode;
            this.inode = null;
            return node;

This may help.
Further inspection of why this code decrements the stack pointer is in
order.

http://www.dlib.org/dlib/september98/millman/09millman.htm






> 
> From: Donna Bergmark (bergmark@CS.Cornell.EDU)
> Date: Mon, Jun 05 2000
> 
> *Next message: Jelks Cabaniss: "W3C validator (was: Strict tables)"
> 
>    * Previous message: Bertilo Wennergren: "Strict tables"
>    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>    * Other mail archives: [this mailing list] [other W3C mailing lists]
>    * Mail actions: [ respond to this message ] [ mail a new topic ]
> 
>   ------------------------------------------------------------------------
> 
> Message-Id: <200006051808.OAA07367@elgin.cs.cornell.edu>
> To: html-tidy@w3.org
> Date: Mon, 05 Jun 2000 14:08:33 -0400
> From: Donna Bergmark <bergmark@CS.Cornell.EDU>
> Subject: [Tidy/Jtidy bug report]ArrayIndexOutOfBoundsException
> 
> There still seems to be a problem with ArrayIndexOutOfBounds.
> I am running Tidy Version 30 April 2000 and JTidy Version 3 June.
> A variant of the Tidy Java Bean example was used.  Here is the
> URL that won't parse:
> http://www.dlib.org/dlib/september98/millman/09millman.html
> 
> Here is the typescript of the run:
> Script started on Mon Jun  5 13:44:12 2000
> ------------------------------------------------------------------
> Latest version of Tidy, JTidy still bombs on ArrayIndexOutOfBounds
> ------------------------------------------------------------------
> (1) Put 3 June 2000 JTidy into my path
> 
> DHCP211-162.CS.CORNELL.EDU% setenv CLASSPATH \
> ? /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
> 
> DHCP211-162.CS.CORNELL.EDU% echo $CLASSPATH
> /home/bergmark/public/src/tools/JTidy/src/30apr2000:/usr/local/jdk1.2.2/lib:.
> 
> (2) Get original version of the JTidy bean (copied from the Java HTML Tidy
> document).  Compile it.
> 
> DHCP211-162.CS.CORNELL.EDU% co -r1.1 Test16.java
> RCS/Test16.java,v  -->  Test16.java
> revision 1.1
> writable Test16.java exists; remove it? [ny](n): y
> done
> 
> DHCP211-162.CS.CORNELL.EDU% javac Test16.java
> 
> (3) Run it on the URL that causes Tidy/JTidy to crash
> 
> DHCP211-162.CS.CORNELL.EDU% cat millman.notidy
> http://www.dlib.org/dlib/september98/millman/09millman.html
> 
> DHCP211-162.CS.CORNELL.EDU% java Test16 \
> ? http://www.dlib.org/dlib/september98/millman/09millman.html \
> ? out error
> java.lang.ArrayIndexOutOfBoundsException: 0 >= 0
>         at java.util.Vector.elementAt(Vector.java:405)
>         at org.w3c.tidy.Lexer.insertedToken(Lexer.java:2738)
>         at org.w3c.tidy.Lexer.getToken(Lexer.java:1185)
>         at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:1672)
>         at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
>         at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
>         at org.w3c.tidy.ParserImpl$ParseDefList.parse(ParserImpl.java:1438)
>         at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
>         at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
>         at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
>         at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
>         at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
>         at org.w3c.tidy.ParserImpl$ParseBlock.parse(ParserImpl.java:2002)
>         at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
>         at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
>         at org.w3c.tidy.ParserImpl$ParseBody.parse(ParserImpl.java:652)
>         at org.w3c.tidy.ParserImpl.parseTag(ParserImpl.java:49)
>         at org.w3c.tidy.ParserImpl.access$0(ParserImpl.java:36)
>         at org.w3c.tidy.ParserImpl$ParseHTML.parse(ParserImpl.java:258)
>         at org.w3c.tidy.ParserImpl.parseDocument(ParserImpl.java:2917)
>         at org.w3c.tidy.Tidy.parse(Tidy.java:1055)
>         at Test16.run(Test16.java:50)
>         at java.lang.Thread.run(Thread.java:475)
> 
> (4) Here is what was written into the output files
> 
> DHCP211-162.CS.CORNELL.EDU% cat error
> 
> Tidy (vers 30th April 2000) Parsing "InputStream"
> line 10 column 61 - Warning: discarding unexpected </a>
> line 16 column 23 - Warning: missing </font> before <h3>
> line 16 column 27 - Warning: inserting implicit <font>
> line 20 column 1 - Warning: inserting implicit <font>
> line 23 column 2 - Warning: missing </font> before <h6>
> line 23 column 5 - Warning: inserting implicit <font>
> line 26 column 1 - Warning: inserting implicit <font>
> line 29 column 23 - Warning: missing </font> before <h3>
> line 29 column 23 - Warning: missing </font> before <h3>
> line 29 column 27 - Warning: inserting implicit <font>
> line 29 column 27 - Warning: inserting implicit <font>
> line 30 column 1 - Warning: discarding unexpected </font>
> line 32 column 1 - Warning: inserting implicit <font>
> line 34 column 1 - Warning: missing </font> before <p>
> line 35 column 1 - Warning: inserting implicit <font>
> line 41 column 53 - Warning: discarding unexpected </i>
> line 43 column 2 - Warning: missing </i> before <p>
> line 43 column 2 - Warning: missing </font> before <p>
> line 43 column 2 - Warning: missing </font> before <p>
> line 47 column 3 - Warning: <img> lacks "alt" attribute
> line 57 column 2 - Warning: missing </font> before <p>
> line 57 column 2 - Warning: missing </h3> before <p>
> line 58 column 1 - Warning: inserting implicit <font>
> line 60 column 1 - Warning: discarding unexpected </h3>
> line 60 column 6 - Warning: discarding unexpected </font>
> line 60 column 13 - Warning: discarding unexpected </font>
> line 60 column 20 - Warning: discarding unexpected </font>
> line 72 column 1 - Warning: trimming empty <p>
> line 72 column 55 - Warning: missing </font> before </h3>
> line 72 column 60 - Warning: discarding unexpected </font>
> line 72 column 67 - Warning: replacing element</p> by <br>
> line 72 column 67 - Warning: inserting implicit <br>
> line 99 column 1 - Warning: trimming empty <p>
> line 99 column 45 - Warning: missing </font> before </h3>
> line 99 column 50 - Warning: discarding unexpected </font>
> line 99 column 57 - Warning: replacing element</p> by <br>
> line 99 column 57 - Warning: inserting implicit <br>
> line 105 column 281 - Warning: trimming empty <p>
> line 106 column 646 - Warning: trimming empty <p>
> line 107 column 129 - Warning: trimming empty <p>
> line 108 column 356 - Warning: trimming empty <p>
> line 109 column 115 - Warning: trimming empty <p>
> line 115 column 5 - Warning: missing </em> before <dl>
> line 115 column 5 - Warning: trimming empty <em>
> line 116 column 5 - Warning: inserting implicit <em>
> line 117 column 1 - Warning: missing <dd>
> line 117 column 1 - Warning: discarding unexpected </em>
> 
> DHCP211-162.CS.CORNELL.EDU% cat out
> DHCP211-162.CS.CORNELL.EDU%
> DHCP211-162.CS.CORNELL.EDU% exit
> exit
> 
> Script done on Mon Jun  5 13:46:39 2000
> ..............................................................
> Here is the java code that invoked the parse:
> // bergmark - may 2000 - Code example of how to use the Tidy Java Bean
> 
> // Code copied from Java HTML Tidy (13 May 2000) document
> 
> // CLASSPATH:  must include path to JTidy:
> // /home/bergmark/public/src/tools/JTidy/src/30apr2000
> 
> import java.io.IOException;
> import java.net.URL;
> import java.io.BufferedInputStream;
> import java.io.FileOutputStream;
> import java.io.PrintWriter;
> import java.io.FileWriter;
> import org.w3c.tidy.Tidy;
> 
> /**
>  * This program shows how HTML could be tidied directly from
>  * a URL stream, and running on separate threads.  Note the use
>  * of the 'parse' method to parse from an InputStream, and send
>  * the pretty-printed result to an OutputStream.
>  * In this example thread th1 outputs XML, and thread th2 outputs
>  * HTML.  This shows that properties are per instance of Tidy.
>  */
> 
> public class Test16 implements Runnable {
> 
>    private String url;
>    private String outFileName;
>    private String errOutFileName;
> 
>    public Test16(String url, String outFileName,
>       String errOutFileName) {
>       this.url = url;
>       this.outFileName = outFileName;
>       this.errOutFileName = errOutFileName;
>    }
> 
>    public void run() {
>       URL u;
>       BufferedInputStream in;
>       FileOutputStream out;
>       Tidy tidy = new Tidy();
> 
>       tidy.setXHTML(true);
>       try {
>          tidy.setErrout(new PrintWriter(new FileWriter(errOutFileName), true));
>          u = new URL(url);
>          in = new BufferedInputStream(u.openStream());
>          out = new FileOutputStream(outFileName);
>          tidy.parse(in, out);
>       } catch ( IOException e ) {
>          System.out.println ( this.toString() + e.toString() );
>       }
>    }
> 
>    public static void main( String[] args ) {
>       Test16 t = new Test16(args[0], args[1], args[2] );
>       Thread th1 = new Thread(t);
>       th1.start();
>    }
> }
> 
>   ------------------------------------------------------------------------
> 
>    * Next message: Jelks Cabaniss: "W3C validator (was: Strict tables)"
>    * Previous message: Bertilo Wennergren: "Strict tables"
>    * Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
>    * Other mail archives: [this mailing list] [other W3C mailing lists]
>    * Mail actions: [ respond to this message ] [ mail a new topic ]

-- 
Andrea Sparling
206.576.3557
AdRelevance, a division of Media Metrix, Inc.
Received on Tuesday, 6 June 2000 12:13:37 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:43 GMT