Re: Java version of HTML Tidy

Jacob Nordfalk wrote:

> Sami Lempinen wrote:
>
> > You're right about the lack of documentation, and you're welcome to
> > write some ;)
>
> OK, I'll try to write some documentation.

Here is my proposal - a README file.

HOW TO USE JTIDY AS A PROGRAM
=============================

If you are not a developer you just need the file Tidy.jar
(found in the subdirectory build/)

Start the program by typing: java -jar Tidy.jar <parameters>

For example. to tidy up a file type: java -jar Tidy.jar file.html

To get help type: java -jar tidy.jar -h

COMMAND LINE PARAMETERS
=======================

Tidy: file1 file2 ...
Utility to clean up & pretty print html files
see http://www.w3.org/People/Raggett/tidy/

options for tidy released on 4th August 2000

Processing directives
--------------------
-indent or -i   indent element content
-omit   or -o   omit optional endtags
-wrap 72        wrap text at column 72 (default is 68)
-upper  or -u   force tags to upper case (default is lower)
-clean  or -c   replace font, nobr & center tags by CSS
-numeric or -n  output numeric rather than named entities
-errors or -e   only show errors
-quiet or -q    suppress nonessential output
-xml            use this when input is wellformed xml
-asxml          to convert html to wellformed xml
-slides         to burst into slides on h2 elements

Character encodings
------------------
-raw            leave chars > 128 unchanged upon output
-ascii          use ASCII for output, Latin-1 for input
-latin1         use Latin-1 for both input and output
-iso2022        use ISO2022 for both input and output
-utf8           use UTF-8 for both input and output
-mac            use the Apple MacRoman character set

File manipulation
---------------
-config <file>  set options from config file
-f <file>       write errors to named <file>
-modify or -m   to modify original files

Miscellaneous
------------
-version or -v  show version
-help   or -h   list command line options
You can also use --blah for any config file option blah

Input/Output default to stdin/stdout respectively
Single letter options apart from -f may be combined
as in:  tidy -f errs.txt -imu foo.html
For further info on HTML see http://www.w3.org/MarkUp


HOW TO USE AS A BEAN
====================

JTidy can be used as a Javabean from a program. Below
is a simple example program that shows how to get an
URL and apply Tidy on it

/**
 * Example of how to use the Tidy bean
 * Courtesy Chris Raber. Modified by Jacob Nordfalk
 */

// core Java stuff.
import java.io.*;
import java.text.*;
import java.util.*;
import java.net.*;

// JTidy stuff.
import org.w3c.tidy.Tidy;

public class TidyURL {
  public static void main(String args[]) {
    if(args.length != 1) {
      System.out.println("Usage TidyURL url");
      System.out.println("Full example: \n"
        +"java -cp .:Tidy.jar TidyURL http://www.esperanto.net");
      return;
    }

    String url = args[0];

    try {
      URL u = new URL(url);

      Reader reader;

      BufferedInputStream sourceIn = new BufferedInputStream(u.openStream());
      ByteArrayOutputStream tidyOutStream = new ByteArrayOutputStream();

      // Create the Tidy bean
      Tidy tidy = new Tidy();

      // Set bean properties
      tidy.setQuiet(false);
      tidy.setShowWarnings(true);
      tidy.setIndentContent(true);
      tidy.setSmartIndent(true);
      tidy.setIndentAttributes(false);
      tidy.setWraplen(1024);
      //tidy.setXHTML(true);
      //tidy.setXmlOut(true);

      tidy.setErrout(new PrintWriter(System.out));

      tidy.parse(sourceIn, tidyOutStream);
      System.out.println(tidyOutStream.toString());

    } catch (Exception ex) {
      ex.printStackTrace();
    }
  }
}

--
Jacob Nordfalk

Received on Thursday, 8 November 2001 19:04:07 UTC