W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

Re: Feature or support request? prevent tidy from stripping the s ingl e whitespace that follows an endtag

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 20 May 2003 02:56:34 +0200
To: Ivo Pletikosic <ivo@benetech.org>
Cc: html-tidy@w3.org
Message-ID: <3ecb5992.4302176@smtp.bjoern.hoehrmann.de>

* Ivo Pletikosic wrote:
>is output as:
>--
><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
><dtbook3>
>  <book>
>    <bodymatter>hello, 
>    <em>how</em>are you 
>    <strong>doing</strong>today.</bodymatter>
>  </book>
></dtbook3>
>--

Tidy must not remove the whitespace between "how" and "are you" and
between "doing" and "today." TidyClassic got it right... In the unbroken
version (or Tidy 04 August 2000) you would get

  <dtbook>
    <book>
      <bodymatter>hello,
      <em>how</em>
  
      are you
      <strong>doing</strong>
  
      today.</bodymatter>
    </book>
  </dtbook>

Not nice but doesn't do harm to most documents. Is that what you are
asking for? If not, how should the output look like? IMHO, the most
reasonable behaivour for Tidy would be to consider elements having text
node siblings or ancestors with text nodes siblings to be inline
elements and all other elements blocklevel elements. The output would
then look similar to

  <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
  <dtbook>
    <book>
      <bodymatter>
        hello, <em>how</em> are you <strong>doing</strong> today.
      </bodymatter>
    </book>
  </dtbook>

or

  ...
      <bodymatter>hello, <em>how</em> are you <strong>doing</strong>
      today.</bodymatter>
  ...

depending on whether Tidy wraps after/before <bodymatter>. For more
sophisticated solutions Tidy would need to know what elements constitute
blocks and what elements are to be considered inline elements.

>Is there a way to configure tidy to treat the XML tags as if it were HTML?
>or is this spacing behavior limited to files identified and markedup as
>HTML?

Depends on what spacing behaivour you ask for. It would also be possible
to introduce a config option that identifies certain elements as those
from the XHTML namespace. However, yes, the XML pretty printer is pretty
simplistic and as noted above obviously broken. It would be possible to
improve it, but I don't think I will do so. Patches welcome :-)

In general, if you already have well-formed XML there is a number of
better tools for you out there.
Received on Monday, 19 May 2003 20:56:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT