- From: Advocate <wotw@inebraska.com>
- Date: Fri, 20 Aug 1999 17:58:11 -0500
- To: <html-tidy@w3.org>
In working on changes to HTML Tidy, I encountered a problem stemming
from the fact that tag names are now stored in lowercase form within
HTML Tidy whereas before they were uppercase. I had assumed that
tidy was using wstrcasecmp for comparisons of tag names and attribute
names, particularly in the lookup() functions for tag and attribute
names.
Apparently this is not the case. At first I thought this could be
remedied by patching the relevant lookup() functions to use
wstrcasecmp instead of wstrcmp, but now doing a grep of the source I
see many many cases where wstrcmp is being used where wstrcasecmp is
much more appropriate and less error prone. Affected files include
attrs.c, clean.c, entities.c, lexer.c, parser.c, and tags.c. (Clean
has many tests for a "style" attribute using wstrcmp.)
Though HTML Tidy is good about making sure it stores all tags and
attribute names in a single case, other software interfacing with
HTML Tidy on a code level may not be as careful (especially since
having them in uppercase makes them easy to differentiate them from
other text where the case cannot be reversed, many would be tempted
to use uppercase).
I will put together a diff applied to a virgin 26Jul99 that will
address these later tonight and post it to the list.
--
,=<#)-=# <http://www.war-of-the-worlds.org/>
,_--//--_,
_-~_-(####)-_~-_ "Did you see that Parkins boy's body in the tunnels?" "Just
(#>_--'~--~`--_<#) the photos. Worst thing I've ever seen; kid had no face."
Received on Friday, 20 August 1999 19:04:19 UTC