Re: images removed from Word-200 documents?

Dear Bjorn et al,

I greatly appreciate your help! At the end of this email, I'll go into a
brief explanation of the business goal I am shooting for, as you may find it
interesting.

To answer you initial question, when I type "tidy -v" I get the response
"HTML Tidy release date: 30th April 2000". I am getting this copy from
http://www.w3.org/People/Raggett/tidy/ , and I pulled it down just yesterday.
The web page, however, implies there may be a 4th August 200 version
somewhere.. perhaps the download points to an older copy?

I am attaching a jar file with an example of what I did. The original
hello.doc is there as created by Office 200. Also hello.htm from "save as web
page". The hello_out.html and errs.txt were creating when running tidy with
the "word-2000" option.

If you can help me figure out what I'm doing wrong so I don't lose the
references to the graphics, it would be greatly appreciated!

As to the business goal, I work for "Cisco Learning", which is a
non-for-profit that is using the eLearning technology created by Cisco to
benefit charitable causes. A lot of people have great content trapped in Word
and Powerpoint documents, and we need to "suck it" into the eLearning
delivery engine. My best guess currently as to how to do this is to:

1) Allow the user to "mark up" their documents with some instructions to the
delivery engine (ala a simple meta-language)
2) User outputs to HMTL from Word or Powerpoint
3) We covert the meta-language to special HTML markup tags (for later use
with tidy's "new tag" capability)
4) Use tidy to scrub the HTML output, along with the "new tags".
5) Grab chunks of HTML, using the "new tags" as markers.
6) Load the chunks of HTML into the appropriate XML for input into the
delivery engine

Six easy steps to success! Right?  :-0

Again, many thanks, and kudos for a great tool!

-- Sherman Mohler, E-Learning Systems Architect
    Cisco Learning Institute



Bjoern Hoehrmann wrote:

> * Sherman Mohler wrote:
> >This is probably a "newbie" question, but I discovered tidy while trying
> >to figre out how to clean up HTML output from Word-200. Tidy does a
> >great job, except for the fact that the links to images are completely
> >removed. Am I doing something wrong, or is there an option I haven't set
> >properly?
>
> Using what version of HTML Tidy? Could you please send sample code?
> --
> Björn Höhrmann { mailto:bjoern@hoehrmann.de } http://www.bjoernsworld.de
> am Badedeich 7 } Telefon: +49(0)4667/981028 { http://bjoern.hoehrmann.de
> 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 } http://www.learn.to/quote/

--

    Sherman Mohler, E-Learning Systems Architect
    Cisco Learning Institute
    2375 East Camelback Road, Suite 220
    Phoenix, Az 85016-3417

   "I see children cry, and I watch them grow
    they'll learn more than I'll ever know
    and I think to myself, what a wonderful world..."
    - Louis Armstrong

Received on Friday, 28 December 2001 12:57:13 UTC