RE: JTidy and java.lang.NullPointerException

I used JTidy on NT4.0 with the following config settings:

EncloseText = false
FixBackslash = true
FixComments = true
HideEndTags = false
IndentAttributes = false
IndentContent = false
KeepFileTimes = true
LogicalEmphasis = false
MakeClean = false
NumEntities = false
OnlyErrors = false
Quiet = false
QuoteAmpersand = true
QuoteMarks = false
QuoteNbsp = flase
RawOut = false
ShowWarnings = true
Slidestyle = null
SmartIndent = false
Spaces = 2
Tabsize = 4
TidyMark = true
UpperCaseAttrs = false
UpperCaseTags = false
Word2000 = false
WrapAsp = true
WrapAttVals = false
WrapJste = true
Wraplen = 68
WrapPhp = true
WrapScriptlets = false
WrapSection = true
Writeback = false
XHTML = true
XmlOut = false
XmlPi = false
XmlPIs = false
XmlTags = false

Most of these are standard. The output worked (attached) but because of the
char set, JTidy really messed up. The formatting is hairy and the chars
displayed are intelligible. I have come across this problem with all char
sets except the 4 supported by JTidy. This is the main problem facing my use
of JTidy to pull apart web pages for translation and summarization. To be
honest it's absolutely key to the project I'm working on. Any suggestions by
yourself or anybody on the list will be much appreciated.

p.s. to get file output from JTidy:
				File f_html = new File("anyfilename" + ".html");   //creats file to
output to
				FileOutputStream FOS_html = new FileOutputStream(f_html);   //creats
output stream associated with file
				InputStream input_html = new URL("http://" + URLtoParse).openStream();
//grabs web page and pipes through an inputstream
				Tidy t = new Tidy();   //instantiates jtidy
				t.parse(input_html, FOS_html); //calls jtidy
Using java.io and java.net

Again any help with the char set problem would defiantly help.

David Hinshelwood
CRL NMSU
Tel: (505) 646 3342 (office)
       (505) 645 5537 (home)

-----Original Message-----
From: html-tidy-request@w3.org [mailto:html-tidy-request@w3.org]On Behalf Of
Georgopoulos N
Sent: Wednesday, July 19, 2000 12:55 PM
To: html-tidy@w3.org
Subject: JTidy and java.lang.NullPointerException


Hi,

I have tried to parse the following site and create a DOM tree:

http://www.in.gr/

with just the following settings:
tidy.setMakeClean(true);
tidy.setXmlTags(false);
tidy.setBreakBeforeBR(true);

and I get the following message:

Tidy (vers 30th April 2000) Parsing "InputStream"
line 11 column 1 - Warning: <link> lacks "type" attribute
line 13 column 1 - Warning: <script> lacks "type" attribute
line 205 column 140 - Warning: <img> lacks "alt" attribute
line 209 column 2 - Warning: <table> lacks "summary" attribute
line 215 column 9 - Warning: <table> lacks "summary" attribute
line 221 column 64 - Warning: <img> lacks "alt" attribute
line 231 column 45 - Warning: <img> lacks "alt" attribute
line 235 column 94 - Warning: <img> lacks "alt" attribute
line 251 column 7 - Warning: <table> lacks "summary" attribute
line 255 column 38 - Warning: <img> lacks "alt" attribute
line 257 column 93 - Warning: <img> lacks "alt" attribute
line 267 column 13 - Warning: <table> lacks "summary" attribute
line 289 column 29 - Warning: <img> lacks "alt" attribute
line 299 column 20 - Warning: missing <td>
line 317 column 7 - Warning: <table> lacks "summary" attribute
line 323 column 13 - Warning: <table> lacks "summary" attribute
line 327 column 56 - Warning: <img> lacks "alt" attribute
line 331 column 15 - Warning: missing <tr>
line 331 column 58 - Warning: <img> lacks "alt" attribute
line 333 column 60 - Warning: <img> lacks "alt" attribute
line 355 column 19 - Warning: <img> lacks "alt" attribute
java.lang.NullPointerException
        at java.text.MessageFormat.format(Compiled Code)
        at java.text.MessageFormat.format(Compiled Code)
        at java.text.Format.format(Compiled Code)
        at java.text.MessageFormat.format(Compiled Code)
        at org.w3c.tidy.Report.attrError(Compiled Code)
        at org.w3c.tidy.Lexer.parseAttrs(Compiled Code)
        at org.w3c.tidy.Lexer.getToken(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseInline.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseBlock.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseRow.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseTableTag.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseBlock.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseRow.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseTableTag.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseBlock.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseRow.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseTableTag.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseBody.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseTag(Compiled Code)
        at org.w3c.tidy.ParserImpl.access$071(Compiled Code)
        at org.w3c.tidy.ParserImpl$ParseHTML.parse(Compiled Code)
        at org.w3c.tidy.ParserImpl.parseDocument(Compiled Code)
        at org.w3c.tidy.Tidy.parse(Compiled Code)
        at org.w3c.tidy.Tidy.parseDOM(Tidy.java:1152)

The same message java.lang.NullPointerException appeared after parsing
other sites too.
Is there anyway to avoid it in general (and not only to catch it)?
Should I change any setting?

And a second question that I would be grateful if someone could help:
How can I redirect the output of JTidy to save it for example in a
StringBuffer?
Is there any particular I/O stream that I can use?

Thank you very much
Nikos

Received on Thursday, 20 July 2000 01:15:16 UTC