W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

FW: Making HTML Tidy a supported library

From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
Date: Mon, 7 May 2001 13:54:24 -0400
Message-ID: <B5C79DDBC655D311B6BD0008C7E64D76013C14E0@exchange.arrakisplanet.com>
To: "'dsr@w3.org'" <dsr@w3.org>
Cc: "'Bjoern Hoehrmann'" <derhoermi@gmx.net>, "'Andy Quick'" <ac.quick@sympatico.ca>, "'Sebastian Lange'" <info@sl-chat.de>, 'André Blavier' <ablavier@wanadoo.fr>, "'html-tidy@w3.org'" <html-tidy@w3.org>
Hi Dave,

Any plans for these types of changes or even a maintenance release for all
the patches since last summer?  I didn't see a response to Bjoern's message
on the list.

Tidy has been a tremendous help, and I'm looking to do more work with it.
Expat might be a good organizational model to follow.  I'd also like to
contribute more updates to the Word 2000 conversion (I mailed in one bug fix
patch to the mailing list).  I'd also like to get all the other patches that
people have mailed in.  A Source Forge project would make it much easier to
collect bug reports and patches and apply them.

How do other developers of Tidy add-ons feel about this issue?

Thanks for any info,
Charlie

-----Original Message-----
From: Bjoern Hoehrmann [mailto:derhoermi@gmx.net]
Sent: Friday, April 13, 2001 1:31 PM
To: html-tidy@w3.org
Cc: dsr@w3.org
Subject: Making HTML Tidy a supported library


Hi Dave,
Hi List,

   As [1] states: "You can also incorporate Tidy as part of a larger
program, for instance in HTML editors or HTML transformation tools used for
import filters, or for when you want to customize Web content to get the
best out of different kinds of browsers. [...]", Software Developers are
free or even encouraged to use HTML Tidy in their applications. This is a
great thing in general, but the current codebase makes it hard to do so,
incorporating Tidy into other applications requires hacking the provided
sources. I'm at most a rookie C programmer, but I think it might be a good
idea to split Tidy into three modules:

  * parser and lexer
      parsing and cleaning the source, building in-memory DOM tree

  * pretty-printer
      takes the DOM tree and puts it into a "prettier" form

  * command line utility
      parse configuration files, parse input from the command
      line, printing the prettier version, etc.

I think several people, including myself, use HTML Tidy just to get a clean
parse tree, e.g. for using XML tools with HTML documents, they only need the
first module. HTML Editors who incorporate HTML Tidy often provide the
option to beautify the written and/or produced code, they additionally use
the second module. The third module is of no interest for other people, so
the library should consist only of the first two modules. Just like it is
done with Expat (James Clarks XML parser), users and developers can just get
the current library and link it into their programs, instead of hacking the
source again and again and redistributing it.

Along with this m12n several other changes could and possibly should be
made, like

  * event-handlers for warnings and errors
  * killing global configuration variables, introduce a parser and a
    pretty-printer object that store relevant informations or store that
    information in the lexer object

I like to hear comments by other list members regarding these changes.

I don't know what the current state of HTML Tidy is, the current
codebase is however nearly a year old, the pending page is very longish,
there are several additional bugs reported to this list and some additional
feature requests have been made. So I like to ask, whether there are
volunteers as requested in [2] and how you think development and
maintainship should look like in the future. Why isn't HTML Tidy in the W3C
CVS repository?

[1] http://www.w3.org/People/Raggett/tidy/
[2] http://www.w3.org/People/Raggett/tidy/pending.html

best regards,
-- 
Björn Höhrmann > mailto:bjoern@hoehrmann.de > http://www.bjoernsworld.de
am Badedeich 7 # Telefon: +49(0)4667/981028 # http://bjoern.hoehrmann.de
25899 Dagebüll < PGP Pub. KeyID: 0xA4357E78 < http://www.learn.to/quote/
Received on Monday, 7 May 2001 17:08:53 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT