Re: New TAG issue: TagSoupIntegration-54

/ Mike Schinkel <mikeschinkel@gmail.com> was heard to say:
| The list newbie in me is curious; why not go ahead and simplify it and
| instead fully define it to *include* white space inside tags and quotes
| around attributes?  It would make comparisons of output between different
| parsers easier.

Because there's no where in any of the common models to store that
information. It's always been regarded as insignificant (as has
attribute order and a few other things). You simply
can't distinguish between <span class="foo"></span> and
<span    class='foo'    ></span>.

Extending the internal models to include this information would
be impractical. And wrong.

Both of those lexical forms represent an empty element called "span"
with a single attribute called "class" with the value "foo". That's
all there is there.

|>> Fortunately we have at least one existence proof of 
|>> such a product and it is called, obviously enough, 
|>> TagSoup: http://home.ccil.org/~cowan/XML/tagsoup/ 
|
| I read this page and have questions.
|
| 	"TagSoup also includes a command-line processor 
| 	that reads HTML files and can generate either 
| 	clean HTML or well-formed XML that is a close 
| 	approximation to XHTML."
|
| 1.) Why a "generate ... a close approximation to XHTML?"  Doesn't it need to
| "generate XHTML?"

I wonder if John reads this list. John? My guess is that it has to do
with rules that XHTML imposes but that aren't easy to deduce from a
random stream of tags, but I could be wrong.

| 2.) Secondly (and you may no know this and maybe I shouldn't even be asking
| on the list, but...) how do I use TagSoup on a Windows machine?

Download a Java VM and you should be able to run the TagSoup jar
without any trouble. You can get a VM from
http://java.sun.com/javase/downloads/index.jsp (Note that I'm employed
by Sun Microsystems, so it can hardly be seen as a surprise that I'd
recommend that one; I'm sure there are others.)

                                        Be seeing you,
                                          norm

-- 
Norman Walsh
XML Standards Architect
Sun Microsystems, Inc.

Received on Thursday, 2 November 2006 15:08:37 UTC