Trying to understand XML line-wrapping behavior

I'm using: HTML Tidy for Mac OS X released on 1st November 2003

I'm assessing the suitability of tidy for performing XML pretty
printing.  I'll illustrate with a short fragment of XML.

If my input document is:

<para>
This
is
a
short
paragraph.
</para>

Then I see this result from the following command:

% tidy -q -xml test.xml
<para>This is a short paragraph.</para>

Within the <para> element tidy has discarded the initial
and trailing whitespace, and joined each line with a space between.

If instead my input document contains inline markup (here
shown by two <emphasis> elements):

<para>
This
<emphasis>is</emphasis>
a
<emphasis>short</emphasis>
paragraph.
</para>

Then the result is somewhat different:

% tidy -q -xml test2.xml
<para>This
<emphasis>is</emphasis>a
<emphasis>short</emphasis>paragraph.</para>

Again tidy has discarded the initial and trailing whitespace.  But it
has also:
- Not converted the newline to space when it occurs before <emphasis>
(what tidy has actually done is add a space at the end of the preceding
line, and then left the newline in place)
- Failed to insert either whitespace *or* a newline after the </emphasis>
tag.

I don't care whether or not tidy tosses the leading/trailing whitespace.
But if it joins lines, I would like it to consistently join them with
a space -- not sometimes with a space and sometimes with no character at all.

Can tidy do this?


Any of the following would be acceptable for the second document:

<para>This
<emphasis>is</emphasis> a
<emphasis>short</emphasis> paragraph.</para>

<para>This <emphasis>is</emphasis> a
<emphasis>short</emphasis> paragraph.</para>

<para> This <emphasis>is</emphasis> a <emphasis>short</emphasis> 
paragraph. </para>

But I do wish to avoid having lines joined with no space between.

Received on Thursday, 20 November 2003 21:38:21 UTC