[PATCH] Outline & Parse Tree Fixes & Features

('binary' encoding is not supported, stored as-is)
Well, it's not really a patch as the changes are somewhat messy and it's
easier to send the whole thing then to generate a patch that will aply
cleanly. Besides, it's easy to do a local diff to see changes.

I started out trying to track down some problems in the parse tree code,
but got carried away. Hopefully the changes are readable. :-)

Detailed change list below, but the highlights are:
    * Eliminated a gazillion warnings (literally thousands of warnings per
      validated file; all of which ended up in your error logs). Down to
      about 10 per file now. :-)
    * Made Parse Trees more readable by stripping unneccesary whitespace,
      STRONGly emphasizing elements and wrapping text to 76 columns.
    * Show IMG ALT text from headings in the Outlines.
    * Reduced memory consumption.


Details:

* Silenced Parse Tree code under -w and use strict. The parse tree code was
  generating a gazillion warnings when running under -w and use strict.
  Literaly thousands of warnings per validated file! Fixed.
* Made the Parse Tree code use Text::Wrap to wrap long lines.
* Made som mainly frivolous changes to a few warnings to make them more
  consistent and informative.
* Stripped newlines from nsgmls output as it's read. This saves a bit over
  one byte of memory per line in the output. As an example, with the output
  for the validator.w3.org homepage this saves you about 2.5KB memory.
  This also makes more sense in later code.
* Demoted errors encountered while close()ing filehandles. They used to be
  fatal die(), but are now non-fatal warn()s.
* Used CGI::Carp to timestamp error messages from warn() and die() which
  end up in the web server's error log. This should make the error log
  entries a *lot* more understandable and usefull.
* Made some changes to the output HTML in preparation for making the output
  HTML 4.0 Strict with CSS. Also made some stylistic changes to make the
  output HTML easier to read. These were supposed to be local
  modifications, but were commited by mistake. I left them in as it's a
  pain to back out of them and it won't do any harm (quite the contrary) if
  it trickles in on w3.org.
* Made Outline code more namespace friendly.
* Stripped some dead code from the Outlines.
  It had 3 conditionals that looked for specific strings to ignore followed
  by a conditional ignoring all but a specific string. No point using 4 ops
  to achieve the results of 1. Actually, since these were ifs with a regex
  match, they translate into a significant performance hit; far more then
  the 4 vs. 1 ratio leads you to believe.
* Replaced a substr() call with a pattern match in the Outline code. We
  allready do the pattern match so why not use it for something?
* Changed confusing use of "while" into a slightly less confusing "until".
* Added support for extracting the value of alt attributes inside Hn
  elements. If a heading contains an IMG with an alt attribute, the alt
  attribute is now used as substitute text.
* Simplified HTML output and the code to generate it in the Parse Tree
  code.
* Made Parse Tree code more namespace friendly.
* Made elements in Parse Trees STRONGly emphasized. This improves
  readability quitea bit IMO. Also stripped away unnecessary whitespace
  from parse trees. While whitespace is often good, in this case it just
  made things confusing. IMO anyway. YMMV.

Received on Sunday, 17 October 1999 19:55:09 UTC