W3C home > Mailing lists > Public > html-tidy@w3.org > October to December 2001

Re: images removed from Word-200 documents?

From: (wrong string) ærnsrød <steinar@manamind.com>
Date: Fri, 28 Dec 2001 15:43:09 -0500 (EST)
Message-ID: <001001c18fdd$9bf42e20$e909e1c3@pcsteinar2>
To: "Sherman Mohler" <smohler@ciscolearning.org>
Cc: <html-tidy@w3.org>

----- Original Message -----
From: "Sherman Mohler" <smohler@ciscolearning.org>
To: "Bjoern Hoehrmann" <derhoermi@gmx.net>
Cc: <html-tidy@w3.org>
Sent: Friday, December 28, 2001 7:56 PM
Subject: Re: images removed from Word-200 documents?

| ......
| 1) Allow the user to "mark up" their documents with some
instructions to the
| delivery engine (ala a simple meta-language)
| 2) User outputs to HMTL from Word or Powerpoint
| 3) We covert the meta-language to special HTML markup tags (for
later use
| with tidy's "new tag" capability)
| 4) Use tidy to scrub the HTML output, along with the "new tags".
| 5) Grab chunks of HTML, using the "new tags" as markers.
| 6) Load the chunks of HTML into the appropriate XML for input into
| delivery engine
| Six easy steps to success! Right?  :-0

Just a few comments to step #1 to #3 in your suggested workflow:

If your users are willing to use some constructed tag-set to markup
their Word documents, you might instead consider this alternative

- Create a set of Word paragraph styles which represents the
structure of your documents
- Save the styles to a Word .dot file which you tell your users to
install in their Office installation
- Tell them how to use your styleset and Word's own embedded styles
to markup the structure of their documents
- Tell them to save the documents to RTF, NOT to HTML!
- Get a copy of the excellent rtf2html tool (for instance
http://www.logictran.com/) and configure it to output the HTML you
- Proceed with your step 4 if you want to clean the HTML code any

Some benefits with this approach:

 + the RTF creation inside Word is pretty stable and good across
versions and platforms, HTML creation _is not_
 + the HTML creation is done centrally by you, not by the users
 + you get far better control over all parts of the documemts,
including the images

Possible downsides:

 + users must install your styles
 + Word paragraph styles are simple, and not capable of
replacing/mapping advanced tag-sets

| Again, many thanks, and kudos for a great tool!
| -- Sherman Mohler, E-Learning Systems Architect
|     Cisco Learning Institute
| Bjoern Hoehrmann wrote:
| > * Sherman Mohler wrote:
| > >This is probably a "newbie" question, but I discovered tidy
while trying
| > >to figre out how to clean up HTML output from Word-200. Tidy
does a
| > >great job, except for the fact that the links to images are
| > >removed. Am I doing something wrong, or is there an option I
haven't set
| > >properly?
| >
| > Using what version of HTML Tidy? Could you please send sample
| > --
| > Björn Höhrmann { mailto:bjoern@hoehrmann.de }
| > am Badedeich 7 } Telefon: +49(0)4667/981028
| > 25899 Dagebüll { PGP Pub. KeyID: 0xA4357E78 }
| --
|     Sherman Mohler, E-Learning Systems Architect
|     Cisco Learning Institute
|     2375 East Camelback Road, Suite 220
|     Phoenix, Az 85016-3417
|    "I see children cry, and I watch them grow
|     they'll learn more than I'll ever know
|     and I think to myself, what a wonderful world..."
|     - Louis Armstrong

Steinar Kjærnsrød <steinar@manamind.com>
Manamind AS
Received on Monday, 31 December 2001 16:16:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 21:38:51 UTC