Making HTML Tidy a supported library from Bjoern Hoehrmann on 2001-04-13 (html-tidy@w3.org from April to June 2001)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Fri, 13 Apr 2001 19:30:54 +0200
To: html-tidy@w3.org
Cc: dsr@w3.org
Message-ID: <8d9edtcnm7rniv6aiu42dihrbbsjb77bvn@4ax.com>

Hi Dave,
Hi List,

   As [1] states: "You can also incorporate Tidy as part of a larger
program, for instance in HTML editors or HTML transformation tools used
for import filters, or for when you want to customize Web content to get
the best out of different kinds of browsers. [...]", Software Developers
are free or even encouraged to use HTML Tidy in their applications. This
is a great thing in general, but the current codebase makes it hard to
do so, incorporating Tidy into other applications requires hacking the
provided sources. I'm at most a rookie C programmer, but I think it
might be a good idea to split Tidy into three modules:

  * parser and lexer
      parsing and cleaning the source, building in-memory DOM tree

  * pretty-printer
      takes the DOM tree and puts it into a "prettier" form

  * command line utility
      parse configuration files, parse input from the command
      line, printing the prettier version, etc.

I think several people, including myself, use HTML Tidy just to get a
clean parse tree, e.g. for using XML tools with HTML documents, they
only need the first module. HTML Editors who incorporate HTML Tidy often
provide the option to beautify the written and/or produced code, they
additionally use the second module. The third module is of no interest
for other people, so the library should consist only of the first two
modules. Just like it is done with Expat (James Clarks XML parser),
users and developers can just get the current library and link it into
their programs, instead of hacking the source again and again and
redistributing it.

Along with this m12n several other changes could and possibly should be
made, like

  * event-handlers for warnings and errors
  * killing global configuration variables, introduce a parser and a
    pretty-printer object that store relevant informations or store that
    information in the lexer object

I like to hear comments by other list members regarding these changes.

I don't know what the current state of HTML Tidy is, the current
codebase is however nearly a year old, the pending page is very longish,
there are several additional bugs reported to this list and some
additional feature requests have been made. So I like to ask, whether
there are volunteers as requested in [2] and how you think development
and maintainship should look like in the future. Why isn't HTML Tidy in
the W3C CVS repository?

[1] http://www.w3.org/People/Raggett/tidy/
[2] http://www.w3.org/People/Raggett/tidy/pending.html

best regards,
-- 
Björn Höhrmann > mailto:bjoern@hoehrmann.de > http://www.bjoernsworld.de
am Badedeich 7 # Telefon: +49(0)4667/981028 # http://bjoern.hoehrmann.de
25899 Dagebüll < PGP Pub. KeyID: 0xA4357E78 < http://www.learn.to/quote/

Received on Friday, 13 April 2001 13:29:25 UTC