- From: Dave Raggett <dsr@w3.org>
- Date: Wed, 10 Aug 2005 11:06:40 +0100 (BST)
- To: David Wilczynski <dwilczyn@usc.edu>
- Cc: html-tidy@w3.org, Tom Lipkis <tal@pss.com>
-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 The support in Tidy for cleaning up the output of HTML exported by Microsoft Office is pretty dated. Office97 proved much easier to clean up than Office 2000, and I don't know if anyone has looked into improving Tidy's current support and studying what is required for Office 2003 and beyond. This is a splendid opportunity for volunteers to study what is needed and to identify techniques for addressing the resultant requirements. It may prove easier to work off the doc format than the mess exported as html/xml. Open Office includes a pretty good import mechanism for Office and could be leveraged for HTML Tidy. One of the problems is how to identify what the author intended as this is well hidden within the document model used by Word. In essence, we need an expert system than can construct a plausible reconstruction without a mess of styles on each paragraph or inline text. The current code in Tidy strips a lot of this out and a much better job could be done at inferring the stylesheet rules. I no longer have the time to work on this, so this is a call for volunteers to assist the current developers working on the Source Forge site for Tidy, see http://tidy.sourceforge.net/ p.s. it may be possible to gradually wean people off Word if there were effective and free alternatives that would run in the web browser without the need to install any software. I am looking into how to achieve this using the design mode feature in IE 5.5+ and Mozilla-based browsers since 1.3 (including Firefox, Galeon, Epiphany, etc.). The promise has been demonstrated by HTMLArea, FCK text editor and widgEditor. (use google for the links) - -- Dave Raggett <dsr@w3.org> W3C lead for multimodal interaction http://www.w3.org/People/Raggett +44 1225 866240 (or 867351) -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFC+dG2b3AdEmxAsUsRArrKAKCN27Vrmq6qc2ZBpAKFMyaLqFMr/wCgmdjA 1AxYD4+E2DxKASRYqUEs4G4= =BtK1 -----END PGP SIGNATURE-----
Received on Wednesday, 10 August 2005 10:06:36 UTC