W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2003

RE: Feature or support request? prevent tidy from stripping the s ingl e whitespace that follows an endtag

From: Ivo Pletikosic <ivo@benetech.org>
Date: Tue, 20 May 2003 10:47:29 -0700
Message-ID: <61F3DB5930AADF49A71965ECD6A62DD805198B@benexchng.benetech.org>
To: "'Bjoern Hoehrmann'" <derhoermi@gmx.net>
Cc: html-tidy@w3.org

Thanks for the clarification/suggestion.

Sounds like the current build of Tidy classic has enough for what I need.
It's a useful tool for plenty things that I will look into the source and
see how I can patch the behavior like you suggest as well.

Thank you and regards,

Ivo

-----Original Message-----
From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net]
Sent: Monday, May 19, 2003 5:57 PM
To: Ivo Pletikosic
Cc: html-tidy@w3.org
Subject: Re: Feature or support request? prevent tidy from stripping the
s ingl e whitespace that follows an endtag


* Ivo Pletikosic wrote:
>is output as:
>--
><?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
><dtbook3>
>  <book>
>    <bodymatter>hello, 
>    <em>how</em>are you 
>    <strong>doing</strong>today.</bodymatter>
>  </book>
></dtbook3>
>--

Tidy must not remove the whitespace between "how" and "are you" and
between "doing" and "today." TidyClassic got it right... In the unbroken
version (or Tidy 04 August 2000) you would get

  <dtbook>
    <book>
      <bodymatter>hello,
      <em>how</em>
  
      are you
      <strong>doing</strong>
  
      today.</bodymatter>
    </book>
  </dtbook>

Not nice but doesn't do harm to most documents. Is that what you are
asking for? If not, how should the output look like? IMHO, the most
reasonable behaivour for Tidy would be to consider elements having text
node siblings or ancestors with text nodes siblings to be inline
elements and all other elements blocklevel elements. The output would
then look similar to

  <?xml version="1.0" encoding="iso-8859-1" standalone="yes"?>
  <dtbook>
    <book>
      <bodymatter>
        hello, <em>how</em> are you <strong>doing</strong> today.
      </bodymatter>
    </book>
  </dtbook>

or

  ...
      <bodymatter>hello, <em>how</em> are you <strong>doing</strong>
      today.</bodymatter>
  ...

depending on whether Tidy wraps after/before <bodymatter>. For more
sophisticated solutions Tidy would need to know what elements constitute
blocks and what elements are to be considered inline elements.

>Is there a way to configure tidy to treat the XML tags as if it were HTML?
>or is this spacing behavior limited to files identified and markedup as
>HTML?

Depends on what spacing behaivour you ask for. It would also be possible
to introduce a config option that identifies certain elements as those
from the XHTML namespace. However, yes, the XML pretty printer is pretty
simplistic and as noted above obviously broken. It would be possible to
improve it, but I don't think I will do so. Patches welcome :-)

In general, if you already have well-formed XML there is a number of
better tools for you out there.
Received on Tuesday, 20 May 2003 13:47:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:54 GMT