I'm trying to use JTidy to convert HTML pages to XML. The HTML has several 'un-tagged' entries. For example:

<A name=Hit3>3.</A> <A href="http://www.matrixscience.com/cgi/protein_view.pl?file=../data/20020130/FaioSfs.dat&hit=4">gi|11528046</A> Mass: 74711 Score: 43
(AF197556) coat protein [Beet necrotic yellow vein virus]
 Observed Mr(expt) Mr(calc) Delta Start End Miss Peptide
 564.70 564.70 565.25 -0.55 168 - 171 0 FEDR
 828.00 828.00 828.51 -0.51 44 - 51 0 AANLSIIK
1032.30 1032.30 1032.56 -0.26 509 - 519 0 AAVAMTALASK
2271.60 2271.60 2271.16 0.44 556 - 578 0 YVHTGIQGGAQLAGAMAVGAMLR
No match to: 1021.10, 3511.70

Is there an easy way to get JTidy to 'tag' the un-tagged text? For example, the text between the 's? I'd rather not right a java program to tag these lines prior to sending it to JTidy.

I'm setting the following params on JTidy:

tidy.setMakeClean(true);
tidy.setBreakBeforeBR(true);
tidy.setShowWarnings(false);
tidy.setOnlyErrors(true);