| The list newbie in me is curious; why not go ahead and simplify it and 
| instead fully define it to *include* white space inside tags and 
| quotes around attributes?  It would make comparisons of output between 
| different parsers easier.
>Because there's no where in any of the common models to store that 
>information. It's always been regarded as insignificant (as has attribute 
>order and a few other things). You simply can't distinguish between <span 
>class="foo"></span> and <span    class='foo'    ></span>.

Ah, that's right, now I remember. Back a few years ago when I was pulling my
own teeth dealing with the lack of that information in the XML DOM. It was a
very unfortunate decision not to include them,  IMO.  :(

>> I wonder if John reads this list. John? My guess is that it has to do
>> rules that XHTML imposes but that aren't easy to deduce from a random 
>> stream of tags, but I could be wrong.

My point was more that, if it was being proposed as a solution, then it
would need to actually be the answer and not just an approximation, that was

>> Download a Java VM and you should be able to run the TagSoup jar 
>> without any trouble. You can get a VM from ... 

Thanks. Here-to-fore, my world has been Windows, SQL Server, ASP, VBScript,
ASP.NET, VB.NET, et. al. (and many things prior to those...)
Mind if I bug you directly if I can't figure out how to get it going (when I
finally get around to trying it. So many spinning plates... :)

