Re: XML pretty-printing and entities from Bjoern Hoehrmann on 2004-01-19 (html-tidy@w3.org from January to March 2004)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Mon, 19 Jan 2004 02:20:47 +0100
To: Paul DuBois <paul@kitebird.com>
Cc: html-tidy@w3.org
Message-ID: <402a2dae.177413897@smtp.bjoern.hoehrmann.de>

* Paul DuBois wrote:
>I also wanted an XML reformatter that wouldn't attempt to resolve entities.
>What with that and other requirements, I ended up writing one myself.
>If you want to try it, it's available here:
>
>http://www.kitebird.com/software/xmlformat/
>
>The parsing code is based on Robert Cameron's REX, which performs a purely
>lexical parse with no entity resolution involved.

What I would love -- possibly integrated into HTML Tidy -- would be a
more sophisticated configuration file for both, tree modification and
pretty printing. I thought about using XPath and CSS like syntax like

  @namespace url(http://www.w3.org/1999/xhtml);

  pre                      { white-space: pre }
  pre                      { break: break-before break-after }
  a[@name]                 { trim: no }
  p                        { trim: if-empty }
  p[preceding-sibling::h1] { indent: 2 }
  @*                       { trim: if-proprietary }
  ...

I think you get the idea. With a good set of properties it should be
quite simple to get exactly your favourite coding style, it would really
be a general purpose highly configurable XML tool. I thought about
integrating libxml with HTML Tidy to use its DOM instead of Tidy's, this
would ease writing APIs, we would have an XPath engine, DTD validation,
at some point XML Schema validation, ... lots of cool stuff. I don't
have the time, unfortunately. It should be quite simple though.

Received on Sunday, 18 January 2004 20:20:55 UTC