JTidy & Un-Tagged Text in HTML Doc from Greg James on 2002-08-06 (html-tidy@w3.org from July to September 2002)

From: Greg James <jamesgc21@attbi.com>
Date: Mon, 5 Aug 2002 21:32:04 -0600
To: <html-tidy@w3.org>
Message-ID: <001701c23cf9$d6591200$0201a8c0@GAUSS>

I'm trying to use JTidy to convert HTML pages to XML.  The HTML has several 'un-tagged' entries.  For example:

<P><A name=Hit3><B>3.</B></A> <A href="http://www.matrixscience.com/cgi/protein_view.pl?file=../data/20020130/FaioSfs.dat&hit=4">gi|11528046</A>  <B>Mass:</B> 74711  <B>Score:</B> 43     
(AF197556) coat protein [Beet necrotic yellow vein virus]
<B> Observed    Mr(expt)   Mr(calc)    Delta   Start     End  Miss  Peptide</B>
   564.70     564.70     565.25     -0.55     168 -   171    0   FEDR
   828.00     828.00     828.51     -0.51      44 -    51    0   AANLSIIK
  1032.30    1032.30    1032.56     -0.26     509 -   519    0   AAVAMTALASK
  2271.60    2271.60    2271.16      0.44     556 -   578    0   YVHTGIQGGAQLAGAMAVGAMLR
<B>No match to:</B> 1021.10, 3511.70

Is there an easy way to get JTidy to 'tag' the un-tagged text?  For example, the text between the <B>'s?  I'd rather not right a java program to tag these lines prior to sending it to JTidy.

I'm setting the following params on JTidy:

tidy.setMakeClean(true);
tidy.setBreakBeforeBR(true);
tidy.setShowWarnings(false);
tidy.setOnlyErrors(true);

Thanks.

Received on Monday, 5 August 2002 23:33:57 UTC