- From: Greg James <jamesgc21@attbi.com>
- Date: Mon, 5 Aug 2002 21:32:04 -0600
- To: <html-tidy@w3.org>
- Message-ID: <001701c23cf9$d6591200$0201a8c0@GAUSS>
I'm trying to use JTidy to convert HTML pages to XML. The HTML has several 'un-tagged' entries. For example: <P><A name=Hit3><B>3.</B></A> <A href="http://www.matrixscience.com/cgi/protein_view.pl?file=../data/20020130/FaioSfs.dat&hit=4">gi|11528046</A> <B>Mass:</B> 74711 <B>Score:</B> 43 (AF197556) coat protein [Beet necrotic yellow vein virus] <B> Observed Mr(expt) Mr(calc) Delta Start End Miss Peptide</B> 564.70 564.70 565.25 -0.55 168 - 171 0 FEDR 828.00 828.00 828.51 -0.51 44 - 51 0 AANLSIIK 1032.30 1032.30 1032.56 -0.26 509 - 519 0 AAVAMTALASK 2271.60 2271.60 2271.16 0.44 556 - 578 0 YVHTGIQGGAQLAGAMAVGAMLR <B>No match to:</B> 1021.10, 3511.70 Is there an easy way to get JTidy to 'tag' the un-tagged text? For example, the text between the <B>'s? I'd rather not right a java program to tag these lines prior to sending it to JTidy. I'm setting the following params on JTidy: tidy.setMakeClean(true); tidy.setBreakBeforeBR(true); tidy.setShowWarnings(false); tidy.setOnlyErrors(true); Thanks.
Received on Monday, 5 August 2002 23:33:57 UTC