W3C home > Mailing lists > Public > public-mobileok-checker@w3.org > March 2007

RE: HTML tidying package for Java

From: Ruadhan O'Donoghue <rodonoghue@mtld.mobi>
Date: Wed, 21 Mar 2007 06:31:11 -0400
Message-ID: <815E07C915F39742A29E5587B3A7FA192A621E3F@lk0-cs0.int.link2exchange.com>
To: "Sean Owen" <srowen@google.com>, <public-mobileok-checker@w3.org>

FWIW, I came across a few of these when scoping ready.mobi.

We are using the TagSoup parser in ready.mobi, and it is extremely
robust. I've used it with Java 4 & 5, but not 6.

I considered JTidy also, but it worried me too much that it was not
maintained.

Ruadhan
 
> -----Original Message-----
> From: public-mobileok-checker-request@w3.org [mailto:public-mobileok-
> checker-request@w3.org] On Behalf Of Sean Owen
> Sent: 21 March 2007 01:25
> To: public-mobileok-checker@w3.org
> Subject: HTML tidying package for Java
> 
> 
> Per my action, I did a little digging on HTML-tidying packages for
> Java. My pick:
> 
> HtmlCleaner - http://htmlcleaner.sourceforge.net/
> This worked pretty well in my informal testing and looks well
maintained
> 
> I could be talked into something else -- this just looks best
initially.
> 
> 
> Other possibilities I considered:
> 
> TagSoup - http://home.ccil.org/~cowan/XML/tagsoup/
> Also looks good, though not Java 5 / 6 compatible??
> 
> NekoHTML - http://people.apache.org/~andyc/neko/doc/html/
> Looks OK, if a bit more out of date and less full-featured
> 
> Java Mozilla HTML Parser -
http://sourceforge.net/projects/mozillaparser
> Looks like it's in development
> 
> JTidy - http://sourceforge.net/projects/jtidy
> A port of the W3C's HTML Tidy code to Java, but, hasn't been updated
in 7
> years.
> 
> Sean
Received on Wednesday, 21 March 2007 10:33:16 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:13:02 GMT