- From: Charles Reitzel <creitzel@rcn.com>
- Date: Wed, 22 Jan 2003 15:20:21 -0500
- To: Michael Goldberg <MGoldberg@yet2.com>
- Cc: "'html-tidy@w3.org'" <html-tidy@w3.org>
This is an OK place for JTidy questions. One thing to keep in mind, even when parsing generic XML, Tidy (including, AFAIK, JTidy) will still recognize HTML elements and treat them accordingly. There have been some improvements in the whitespace handling for inline elements in C Tidy. JTidy development has been inactive for some time. I believe they are waiting for an "official" release from us (C Tidy). But we don't do those. Besides internals of parsing, the biggest change/difference is in option handling. JTidy is still more advanced in terms of I/O abstraction. take it easy, Charlie At 08:31 AM 1/22/2003 -0800, Michael Goldberg wrote: >Reposting, as I still have this issue, and would love to see a response. Is >there a separate forum to discuss JTidy? I apologize in advance if this is >not the correct place to discuss JTidy matters. > >Sincerely, >Michael S. Goldberg > > >All, > >I am using JTidy to parse an input string into an Element. Later, when I >serialize the Element back into a String, I lose a space. Is there an >option I can specify so as not to lose the space? > >In the sample code provided below, I lose the space just prior to the letter >"C". I'm pretty sure the problem is introduced in the first part (the >parsing) rather than the latter part (the serializing). > >Interestingly, if I change my input to "<html><title/><body>A <b>B</b> >C</body></html>" and use setXmlTags( false ), the space is not lost. Why? > >Sincerely, >Michael S. Goldberg > >Sample Code: > >Tidy tidyInstance = new Tidy(); >tidyInstance.setXmlTags( true ); > >String inString = "<temp>A <b>B</b> C</temp>"; >ByteArrayInputStream in = new ByteArrayInputStream( inString.getBytes() ); > >// Use JTidy to parse the input into a Document >Document document = tidyInstance.parseDOM( in, null ); >Element documentElement = document.getDocumentElement(); > >// Setup an output format for the makeSerializer() call below >OutputFormat outputFormat = new OutputFormat(); >outputFormat.setOmitXMLDeclaration( true ); > >// Set up a byteArrayOutputStream, the output of the makeSerializer() call >below >ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream(); > >// Get a SerializerFactory for the makeSerializer() call below >SerializerFactory xmlSerializerFactory = >SerializerFactory.getSerializerFactory( Method.XML ); > >// Use serializer to convert the element into a String >XMLSerializer xmlSerializer = (XMLSerializer) >xmlSerializerFactory.makeSerializer( new OutputStreamWriter( >byteArrayOutputStream ), outputFormat ); >xmlSerializer.serialize( documentElement ); > >System.out.println( byteArrayOutputStream );
Received on Wednesday, 22 January 2003 15:13:11 UTC