W3C home > Mailing lists > Public > public-mobileok-checker@w3.org > March 2007

Re: HTML tidying package for Java

From: Sean Owen <srowen@google.com>
Date: Wed, 21 Mar 2007 09:53:57 -0400
Message-ID: <e920a71c0703210653h66e32309ye8b84752e6a5edd7@mail.gmail.com>
To: "Ruadhan O'Donoghue" <rodonoghue@mtld.mobi>
Cc: public-mobileok-checker@w3.org

I'm open to TagSoup if it in fact works in Java 5, and has proved
useful in practice. HtmlCleaner *looked* better but only at first
glance and with a little experimentation.

On 3/21/07, Ruadhan O'Donoghue <rodonoghue@mtld.mobi> wrote:
> FWIW, I came across a few of these when scoping ready.mobi.
>
> We are using the TagSoup parser in ready.mobi, and it is extremely
> robust. I've used it with Java 4 & 5, but not 6.
>
> I considered JTidy also, but it worried me too much that it was not
> maintained.
>
> Ruadhan
>
> > -----Original Message-----
> > From: public-mobileok-checker-request@w3.org [mailto:public-mobileok-
> > checker-request@w3.org] On Behalf Of Sean Owen
> > Sent: 21 March 2007 01:25
> > To: public-mobileok-checker@w3.org
> > Subject: HTML tidying package for Java
> >
> >
> > Per my action, I did a little digging on HTML-tidying packages for
> > Java. My pick:
> >
> > HtmlCleaner - http://htmlcleaner.sourceforge.net/
> > This worked pretty well in my informal testing and looks well
> maintained
> >
> > I could be talked into something else -- this just looks best
> initially.
> >
> >
> > Other possibilities I considered:
> >
> > TagSoup - http://home.ccil.org/~cowan/XML/tagsoup/
> > Also looks good, though not Java 5 / 6 compatible??
> >
> > NekoHTML - http://people.apache.org/~andyc/neko/doc/html/
> > Looks OK, if a bit more out of date and less full-featured
> >
> > Java Mozilla HTML Parser -
> http://sourceforge.net/projects/mozillaparser
> > Looks like it's in development
> >
> > JTidy - http://sourceforge.net/projects/jtidy
> > A port of the W3C's HTML Tidy code to Java, but, hasn't been updated
> in 7
> > years.
> >
> > Sean
>
>
Received on Wednesday, 21 March 2007 13:54:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:13:02 GMT