W3C home > Mailing lists > Public > html-tidy@w3.org > January to March 2002

Re: [Tidy-dev] MakeBare and hyphens

From: Lee Passey <lee@www.dysfunctionals.org>
Date: Wed, 20 Feb 2002 14:42:26 -0700
Message-ID: <3C741842.2000608@www.dysfunctionals.org>
To: "'Tidy Development'" <tidy-develop@lists.sourceforge.net>
CC: Dave Raggett <dave.raggett@openwave.com>, html-tidy <html-tidy@w3.org>
I've spent a fair amount of time on an off during the last couple of 
weeks looking at "mshtm" output from Word 2000.  While it has many, many 
flaws, over-abundance of  non-breaking spaces is not one of them.  I 
suspect that the --bare option was more designed to allow people to 
gracefully downgrade from the most advanced named entities more than to 
clean up Micro$ofts mess -- that's what I use it for.

Along these same lines, another recent named entity that Word 2000 seems 
enamoured with, and which many of my older browsers support only 
half-heartedly, is &hellip;.  So I would like to add some code to the 
MakeBare option which will convert that entity to &#160;.&#160;.&#160. 
(that's three periods delimited by three non-breaking spaces).

Unless I am faced with vehement objections, I'll check in these changes 
about this time next week.  In the meantime, I'm going to continue to 
explore how we can deal with mshtm more effectively.


Dave Raggett wrote:

>If I remember rightly, Word 2000 was creating way too many
>non-breaking spaces, and it was going to make the user's
>work much easier to just get rid of them. Remember the -bare
>option is a draconian clean up of the messy HTML exported by
>Word 2000. Ideally, some kind soul would volunteer to write
>an expert system to figure out the authors intentions from
>amongst the complexity of the Word output. This for instance
>would involve understanding the various uses of tab stops
>and symbols for lists.
>
>--bare is probably the wrong name. I think I added it to
>avoid polluting the work I had done on the word2000 option.
>It would be good for someone to investigate the Office 2000
>issue in more detail and come up with a recommendation.
>
>Regards,
>
>-- Dave Raggett <dave.raggett@openwave.com> or <dsr@w3.org>
>W3C Visiting Fellow, see http://www.w3.org/People/Raggett 
>tel/fax: +44 1225 866-240 (or 867-351) +44 771 213 7629 (gsm)
>
>-----Original Message-----
>From: tidy-develop-admin@lists.sourceforge.net
>[mailto:tidy-develop-admin@lists.sourceforge.net]On Behalf Of Jelks
>Cabaniss
>Sent: Saturday, February 02, 2002 3:34 PM
>To: 'Tidy Development'
>Subject: RE: [Tidy-dev] MakeBare and hyphens
>
>
>Lee Passey wrote:
>
>>     Filters from Word and PowerPoint often use smart
>>     quotes resulting in character codes between 128
>>     and 159. Unfortunately, the corresponding HTML 4.0
>>     entities for these are not widely supported. The
>>     following converts dashes and quotation marks to
>>     the nearest ASCII equivalent.
>>
>
>>The code maps m- and n-dashes to hyphen, right and left
>>double quotes (which never seem to look good on the screen)
>>to " and right and left single quotes to '.  
>>
>
>Ah, so -bare is kind of a "downgrade Unicode characters to their ASCII
>equivalents"?  In that case, it could indeed be useful at times.
>
>>the -word-2000
>>option correctly maps these codes to their unicode
>>equivalents,  which are in the range 0x2013 to 0x201E, but
>>only the most recent UAs know how to display them correctly.
>>
>
>Last I looked, NS4 would display these correctly if the &#nnn;
>representation were used (at least the dashes, don't recall about the
>curly quotes).  So if the author wanted, he could invoke
>-numeric-entities.
>
>>Unfortunately, it also maps &nbsp; to &#20 (the reqular
>>ASCII space) which _is_ fairly universally recognized.  If I
>>ruled the world, I would remove that "feature."
>>
>
>Agreed.  Getting rid of &nbsp;'s is an assumption sometimes warranted,
>but sometimes otherwise.
>
>
>/Jelks
>
>
>_______________________________________________
>Tidy-develop mailing list
>Tidy-develop@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/tidy-develop
>
>
>
>
>_______________________________________________
>Tidy-develop mailing list
>Tidy-develop@lists.sourceforge.net
>https://lists.sourceforge.net/lists/listinfo/tidy-develop
>
Received on Wednesday, 20 February 2002 16:45:09 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:51 GMT