W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

documentation for JTidy?

From: Peter O Sigurdson/Markham/IBM <sigpete@ca.ibm.com>
Date: Fri, 23 Nov 2001 08:20:27 -0500
To: Dave Raggett <dsr@w3.org>, html-tidy@w3.org
Message-ID: <OF33AC4661.A81F4888-ON05256B0D.0048BAD3@mkm.can.ibm.com>

I am really happy to have found JTidy and I very excited to start using it,
but :

is there a source of good quality documentation?

I looked at the JavaDoc documentation, but it didn't provide enough details
is there a body of examples available?

here is my specific issue:

I am working on a project that requires migrating a large number of static
html files into a database container.
These html files have been developed and maintained over the years by a
number of different developers, using a variety of tools.  I hand coded the
start of many of them back in 96, 97.  The point is: they are NOT
well-formed HTML!

I want to use JTidy to create XML instances from the files, then use JTidy
methods to clean up the HTML and then emit only a portion of the HTML
instance (ie I want to use XSLT to filter out all but the "content" portion
of the page, since the new database container will provide the Header,
footer and left navigation bar content).

here is my code:

private String get_static_HTML_fileContents(File inputFile) {
    String fileContents = "";
    try {
        String outputFile = static_html_directory + "\\mac.html";
        // recode this method to use JTidy
        // how can I make an XML instance?
        InputStream inStream = new FileInputStream(inputFile);
        OutputStream outStream = new FileOutputStream(outputFile);

        Tidy jtidy = new Tidy();
        BufferedReader br = new BufferedReader(new FileReader(inputFile));
        Node xmlDocument = jtidy.parse(inStream, outStream);
        TagTable tt = new TagTable();
        xmlDocument.findBody(tt);
    } catch (Exception e) {
        e.printStackTrace();
    }
    return fileContents;
}
Received on Friday, 23 November 2001 08:21:50 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:47 GMT