- From: Reitzel, Charlie <CReitzel@arrakisplanet.com>
- Date: Mon, 14 May 2001 15:44:10 -0400
- To: "'Terry Teague'" <teague@mailandnews.com>
- Cc: Bjoern Hoehrmann <derhoermi@gmx.net>, ac.quick@sympatico.ca, info@sl-chat.de, ablavier@wanadoo.fr, html-tidy@w3.org, dforcier@macromedia.com
Hi Terry, Thanks for jumping in. It looks like you have a (yet another) textbook requirement for Tidy in library form. You also are throwing a MacOS port into the mix. I am agnostic about the design approach to the library, except that I don't want to wait a long time for it. My reading of the Tidy code is that much of the cleanup occurs during parse. Which makes sense as the data has to be clean enough to build the internal tree. We might be able to get some mileage from a pluggable reporting mechanism. I.e. re-implement ReportWarning() to call a user supplied callback/handler. To generalize, I think we need to get creative in how we update the code to the get flexibility we need while perturbing the sources as little as possible. Also, as much as I would like to make it look pretty in C++, I think we can forestall a great many integration and porting issues by keeping the core functionality in pure ANSI C. [Side issue: Tidy is DOM-like in the sense that it builds a tree representation of the entire document. The similarity may end there, however. Has anyone done a compare and contrast between a Tidy tree and a DOM tree? My experience w/ DOM and HTML is limited to a bit of Javascript hacking. Because of it's role in cleaning up mal-formed HTML, however, I cannot see how Tidy would be useful layered _over_ a DOM. I can see how it could be used to _produce_ a clean DOM. But this is only a first-glance impression.] It looks like you want forge ahead with the job at hand. I guess we can only wait so long to get a reply from Dave. Bummer about your hard-drive. I think your bug list will save us a tremendous amount of work, so here's luck to your data recovery! Charlie -----Original Message----- From: Terry Teague [mailto:teague@mailandnews.com] Sent: Tuesday, May 08, 2001 3:01 AM To: Bjoern Hoehrmann; Reitzel, Charlie Cc: ac.quick@sympatico.ca; info@sl-chat.de; ablavier@wanadoo.fr; html-tidy@w3.org; dforcier@macromedia.com Subject: Re: FW: Making HTML Tidy a supported library At 12:42 AM +0200 5/8/01, Bjoern Hoehrmann wrote: >* Reitzel, Charlie wrote: >>Any plans for these types of changes or even a maintenance release for all >>the patches since last summer? I didn't see a response to Bjoern's message >>on the list. > >There were yours... :-) I didn't respond to the earlier messages as I wanted to see what other people's responses were first. >>Tidy has been a tremendous help, and I'm looking to do more work with it. >>Expat might be a good organizational model to follow. > >Yes, definitly (with the exception, that Expat is an event-based XML >parser, HTML Tidy could be something alike, but that wouldn't be usable, >since HTML Tidys power comes from it's ability to clean up a given tree. >I've written, merly for an educational purpose (since I'm just a C >rookie and an XS novice :-), an unreleased Perl module >HTML::Parser::Tidy that uses Tidy to build a tree and then fires SAX >events to a Perl object to rebuild the tree for XML::DOM or XML::XPath. >While writing this module (and doing more and more ugly hacks in the >Tidy sources), I came to the conclusion, that one could get far more >features out of Tidy, so I've written the referenced posting to ease the >incorporation of Tidy in other envoirments...) Tidy in its present form causes me extra work (which I don't feel has to be necessary - although I would say the majority of the code is very portable to different platforms) to port to Mac OS. I was waiting for a future DOM based Tidy to make major architectural changes to my code. Specifically I am looking to have a single Tidy engine that I could use with multiple Mac OS user interfaces, communicating via Apple Events (which would allow external scripting of the engine by 3rd parties). Apple's recommended scripting implementation is known as the Object Model - while this is not strictly object orientated, having the guts of Tidy being separate from the UI (which it is not now) with a defined API would help. So perhaps an event-driven model like expat would work for me (although I haven't looked at the expat code). >>I'd also like to >>contribute more updates to the Word 2000 conversion (I mailed in one bug fix >>patch to the mailing list). I'd also like to get all the other patches that >>people have mailed in. > >Same for all of us. > >>A Source Forge project would make it much easier to >>collect bug reports and patches and apply them. Yes, I would agree. And a proper bug database would make the following unnecessary. >Terry Teague already collected a lot of reported bugs, see > > * http://www.geocities.com/SiliconValley/1057/tidybugs.html and > * http://www.geocities.com/SiliconValley/1057/tidybugs2.html > >There are actually more, since Terry stopped to collect them. Anybody >else who kept track of bug reports and/or feature requests? I stopped after I had released my updated versions based on the 04 Aug 2000 source, and was waiting for the mythical Mar 2001 update before updating the bug report collection again (i.e. the pages were getting long, and I was hoping that I could retire most of the bug reports once they had been fixed in a maintenance release). However... I lost my hard drive to a crash last week, including all my html-w3 EMails. I hope to get my data recovered by a drive recovery company. Yes, I know I can go through the online archives, but I would rather spend my time on other things right at the moment. So the bug report update will have to wait. >>How do other developers of Tidy add-ons feel about this issue? > >The HTML Tidy licence allows us to modify and redistribute the given >code, so it would be possible to start a Sourceforge project with the >latest codebase and fix the given bugs. This would be what James Clark >did with Expat. Dave seems to be too busy to work on Tidy, if we want to >continue HTML Tidys development, that'd be the best thing to do, I'd >like to hear what Dave think about that, but since he didn't reply the >last months to my mails and mails on this list, I have small hope to >hear from him, so I'd suggest to start the project anyways. I agree. I would like to get involved (to help make the port more portable and/or add Mac OS support into the base source code), but I know I am busy too. I am just coming up to speed with cvs, but I have a lot of experience with multi-person development teams/project on various platforms (including open source). Anyway, I am listening - keep the discussion going. Regards, Terry
Received on Monday, 14 May 2001 15:43:59 UTC