W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2003

Why is space removed?

From: Michael Goldberg <MGoldberg@yet2.com>
Date: Tue, 14 Jan 2003 10:03:50 -0800
Message-ID: <D090FE9586C9D4119E4E00A02493157AB5BD5E@ferris.tahoe.yet2.com>
To: "'html-tidy@w3.org'" <html-tidy@w3.org>

All,

I am using JTidy to parse an input string into an Element.  Later, when I
serialize the Element back into a String, I lose a space.  Is there an
option I can specify so as not to lose the space?

In the sample code provided below, I lose the space just prior to the letter
"C".  I'm pretty sure the problem is introduced in the first part (the
parsing) rather than the latter part (the serializing).

Interestingly, if I change my input to "<html><title/><body>A <b>B</b>
C</body></html>" and use setXmlTags( false ), the space is not lost.  Why?

Sincerely,
Michael S. Goldberg

Sample Code:

Tidy tidyInstance = new Tidy();
tidyInstance.setXmlTags( true );

String inString = "<temp>A <b>B</b> C</temp>";
ByteArrayInputStream in = new ByteArrayInputStream( inString.getBytes() );

// Use JTidy to parse the input into a Document
Document document = tidyInstance.parseDOM( in, null );
Element	documentElement = document.getDocumentElement();

// Setup an output format for the makeSerializer() call below
OutputFormat outputFormat = new OutputFormat();
outputFormat.setOmitXMLDeclaration( true );

// Set up a byteArrayOutputStream, the output of the makeSerializer() call
below
ByteArrayOutputStream byteArrayOutputStream = new ByteArrayOutputStream();

// Get a SerializerFactory for the makeSerializer() call below
SerializerFactory xmlSerializerFactory =
SerializerFactory.getSerializerFactory( Method.XML );

// Use serializer to convert the element into a String
XMLSerializer xmlSerializer = (XMLSerializer)
xmlSerializerFactory.makeSerializer( new OutputStreamWriter(
byteArrayOutputStream ), outputFormat );
xmlSerializer.serialize( documentElement );

System.out.println( byteArrayOutputStream );
Received on Tuesday, 14 January 2003 13:10:41 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:53 GMT