- From: Kim <kime@atlantic.net>
- Date: Tue, 6 Aug 2002 05:38:34 -0400
- To: html-tidy@w3.org
- Message-ID: <1028626714.3d4f991a83f40@webmail.atlantic.net>
Somehow this eneded up in my mailbox? It is from jamesgc21@attbi.com
----- Forwarded message from Greg James <jamesgc21@attbi.com> -----
Date: Mon, 5 Aug 2002 21:32:04 -0600
From: Greg James <jamesgc21@attbi.com>
Reply-To: Greg James <jamesgc21@attbi.com>
Subject: JTidy & Un-Tagged Text in HTML Doc
To: html-tidy@w3.org
I'm trying to use JTidy to convert HTML pages to XML. The HTML has several 'un-
tagged' entries. For example:
<P><A name=Hit3><B>3.</B></A> <A
href="http://www.matrixscience.com/cgi/protein_view.pl?
file=../data/20020130/FaioSfs.dat&hit=4">gi|11528046</A> <B>Mass:</B> 74711
<B>Score:</B> 43
(AF197556) coat protein [Beet necrotic yellow vein virus]
<B> Observed Mr(expt) Mr(calc) Delta Start End Miss Peptide</B>
564.70 564.70 565.25 -0.55 168 - 171 0 FEDR
828.00 828.00 828.51 -0.51 44 - 51 0 AANLSIIK
1032.30 1032.30 1032.56 -0.26 509 - 519 0 AAVAMTALASK
2271.60 2271.60 2271.16 0.44 556 - 578 0
YVHTGIQGGAQLAGAMAVGAMLR
<B>No match to:</B> 1021.10, 3511.70
Is there an easy way to get JTidy to 'tag' the un-tagged text? For example,
the text between the <B>'s? I'd rather not right a java program to tag these
lines prior to sending it to JTidy.
I'm setting the following params on JTidy:
tidy.setMakeClean(true);
tidy.setBreakBeforeBR(true);
tidy.setShowWarnings(false);
tidy.setOnlyErrors(true);
Thanks.
----- End forwarded message -----
--
-------------------------------------------------------------------------
This message was sent through Atlantic.Net Webmail.
Sign up for fast, reliable dial-up service for only $19.95/mo.
Visit www.atlantic.net to learn more.
Attachments
- text/html attachment: unnamed
Received on Tuesday, 6 August 2002 05:38:36 UTC