W3C home > Mailing lists > Public > html-tidy@w3.org > April to June 2001

RE: FW: Making HTML Tidy a supported library

From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
Date: Mon, 14 May 2001 15:44:10 -0400
Message-ID: <B5C79DDBC655D311B6BD0008C7E64D76013C14FA@exchange.arrakisplanet.com>
To: "'Terry Teague'" <teague@mailandnews.com>
Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, ac.quick@sympatico.ca, info@sl-chat.de, ablavier@wanadoo.fr, html-tidy@w3.org, dforcier@macromedia.com
Hi Terry,

Thanks for jumping in.   It looks like you have a (yet another) textbook
requirement for Tidy in library form.  You also are throwing a MacOS port
into the mix.  I am agnostic about the design approach to the library,
except that I don't want to wait a long time for it.  

My reading of the Tidy code is that much of the cleanup occurs during parse.
Which makes sense as the data has to be clean enough to build the internal
tree.  We might be able to get some mileage from a pluggable reporting
mechanism.  I.e. re-implement ReportWarning() to call a user supplied
callback/handler.

To generalize, I think we need to get creative in how we update the code to
the get flexibility we need while perturbing the sources as little as
possible.  Also, as much as I would like to make it look pretty in C++, I
think we can forestall a great many integration and porting issues by
keeping the core functionality in pure ANSI C.

[Side issue: Tidy is DOM-like in the sense that it builds a tree
representation of the entire document.  The similarity may end there,
however.  Has anyone done a compare and contrast between a Tidy tree and a
DOM tree?  My experience w/ DOM and HTML is limited to a bit of Javascript
hacking.  Because of it's role in cleaning up mal-formed HTML, however, I
cannot see how Tidy would be useful layered _over_ a DOM.  I can see how it
could be used to _produce_ a clean DOM.  But this is only a first-glance
impression.]

It looks like you want forge ahead with the job at hand.  I guess we can
only wait so long to get a reply from Dave.  Bummer about your hard-drive.
I think your bug list will save us a tremendous amount of work, so here's
luck to your data recovery!

Charlie



-----Original Message-----
From: Terry Teague [mailto:teague@mailandnews.com]
Sent: Tuesday, May 08, 2001 3:01 AM
To: Bjoern Hoehrmann; Reitzel, Charlie
Cc: ac.quick@sympatico.ca; info@sl-chat.de; ablavier@wanadoo.fr;
html-tidy@w3.org; dforcier@macromedia.com
Subject: Re: FW: Making HTML Tidy a supported library


At 12:42 AM +0200 5/8/01, Bjoern Hoehrmann wrote:
>* Reitzel, Charlie wrote:
>>Any plans for these types of changes or even a maintenance release for all
>>the patches since last summer?  I didn't see a response to Bjoern's
message
>>on the list.
>
>There were yours... :-)

I didn't respond to the earlier messages as I wanted to see what other
people's responses were first.

>>Tidy has been a tremendous help, and I'm looking to do more work with it.
>>Expat might be a good organizational model to follow.
>
>Yes, definitly (with the exception, that Expat is an event-based XML
>parser, HTML Tidy could be something alike, but that wouldn't be usable,
>since HTML Tidys power comes from it's ability to clean up a given tree.
>I've written, merly for an educational purpose (since I'm just a C
>rookie and an XS novice :-), an unreleased Perl module
>HTML::Parser::Tidy that uses Tidy to build a tree and then fires SAX
>events to a Perl object to rebuild the tree for XML::DOM or XML::XPath.
>While writing this module (and doing more and more ugly hacks in the
>Tidy sources), I came to the conclusion, that one could get far more
>features out of Tidy, so I've written the referenced posting to ease the
>incorporation of Tidy in other envoirments...)

Tidy in its present form causes me extra work (which I don't feel has to be
necessary - although I would say the majority of the code is very portable
to different platforms) to port to Mac OS. I was waiting for a future DOM
based Tidy to make major architectural changes to my code. Specifically I
am looking to have a single Tidy engine that I could use with multiple Mac
OS user interfaces, communicating via Apple Events (which would allow
external scripting of the engine by 3rd parties). Apple's recommended
scripting implementation is known as the Object Model - while this is not
strictly object orientated, having the guts of Tidy being separate from the
UI (which it is not now) with a defined API would help. So perhaps an
event-driven model like expat would work for me (although I haven't looked
at the expat code).

>>I'd also like to
>>contribute more updates to the Word 2000 conversion (I mailed in one bug
fix
>>patch to the mailing list).  I'd also like to get all the other patches
that
>>people have mailed in.
>
>Same for all of us.
>
>>A Source Forge project would make it much easier to
>>collect bug reports and patches and apply them.

Yes, I would agree. And a proper bug database would make the following
unnecessary.

>Terry Teague already collected a lot of reported bugs, see
>
> * http://www.geocities.com/SiliconValley/1057/tidybugs.html and
> * http://www.geocities.com/SiliconValley/1057/tidybugs2.html
>
>There are actually more, since Terry stopped to collect them. Anybody
>else who kept track of bug reports and/or feature requests?

I stopped after I had released my updated versions based on the 04 Aug 2000
source, and was waiting for the mythical Mar 2001 update before updating
the bug report collection again (i.e. the pages were getting long, and I
was hoping that I could retire most of the bug reports once they had been
fixed in a maintenance release).

However... I lost my hard drive to a crash last week, including all my
html-w3 EMails. I hope to get my data recovered by a drive recovery
company. Yes, I know I can go through the online archives, but I would
rather spend my time on other things right at the moment. So the bug report
update will have to wait.

>>How do other developers of Tidy add-ons feel about this issue?
>
>The HTML Tidy licence allows us to modify and redistribute the given
>code, so it would be possible to start a Sourceforge project with the
>latest codebase and fix the given bugs. This would be what James Clark
>did with Expat. Dave seems to be too busy to work on Tidy, if we want to
>continue HTML Tidys development, that'd be the best thing to do, I'd
>like to hear what Dave think about that, but since he didn't reply the
>last months to my mails and mails on this list, I have small hope to
>hear from him, so I'd suggest to start the project anyways.

I agree. I would like to get involved (to help make the port more portable
and/or add Mac OS support into the base source code), but I know I am busy
too. I am just coming up to speed with cvs, but I have a lot of experience
with multi-person development teams/project on various platforms (including
open source).


Anyway, I am listening - keep the discussion going.

Regards, Terry
Received on Monday, 14 May 2001 15:43:59 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:45 GMT