- From: Sherman Mohler <smohler@ciscolearning.org>
- Date: Fri, 28 Dec 2001 10:56:50 -0800
- To: Bjoern Hoehrmann <derhoermi@gmx.net>
- CC: html-tidy@w3.org
- Message-ID: <3C2CC072.D833E572@ciscolearning.org>
Dear Bjorn et al, I greatly appreciate your help! At the end of this email, I'll go into a brief explanation of the business goal I am shooting for, as you may find it interesting. To answer you initial question, when I type "tidy -v" I get the response "HTML Tidy release date: 30th April 2000". I am getting this copy from http://www.w3.org/People/Raggett/tidy/ , and I pulled it down just yesterday. The web page, however, implies there may be a 4th August 200 version somewhere.. perhaps the download points to an older copy? I am attaching a jar file with an example of what I did. The original hello.doc is there as created by Office 200. Also hello.htm from "save as web page". The hello_out.html and errs.txt were creating when running tidy with the "word-2000" option. If you can help me figure out what I'm doing wrong so I don't lose the references to the graphics, it would be greatly appreciated! As to the business goal, I work for "Cisco Learning", which is a non-for-profit that is using the eLearning technology created by Cisco to benefit charitable causes. A lot of people have great content trapped in Word and Powerpoint documents, and we need to "suck it" into the eLearning delivery engine. My best guess currently as to how to do this is to: 1) Allow the user to "mark up" their documents with some instructions to the delivery engine (ala a simple meta-language) 2) User outputs to HMTL from Word or Powerpoint 3) We covert the meta-language to special HTML markup tags (for later use with tidy's "new tag" capability) 4) Use tidy to scrub the HTML output, along with the "new tags". 5) Grab chunks of HTML, using the "new tags" as markers. 6) Load the chunks of HTML into the appropriate XML for input into the delivery engine Six easy steps to success! Right? :-0 Again, many thanks, and kudos for a great tool! -- Sherman Mohler, E-Learning Systems Architect Cisco Learning Institute Bjoern Hoehrmann wrote: > * Sherman Mohler wrote: > >This is probably a "newbie" question, but I discovered tidy while trying > >to figre out how to clean up HTML output from Word-200. Tidy does a > >great job, except for the fact that the links to images are completely > >removed. Am I doing something wrong, or is there an option I haven't set > >properly? > > Using what version of HTML Tidy? Could you please send sample code? > -- > Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de > am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de > 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/ -- Sherman Mohler, E-Learning Systems Architect Cisco Learning Institute 2375 East Camelback Road, Suite 220 Phoenix, Az 85016-3417 "I see children cry, and I watch them grow they'll learn more than I'll ever know and I think to myself, what a wonderful world..." - Louis Armstrong
Attachments
- application/x-zip-compressed attachment: hello.zip
Received on Friday, 28 December 2001 12:57:13 UTC