Re: Parsing methods (fwd)

Lee Daniel Crocker (lcrocker@calweb.com)
Wed, 10 Jul 1996 12:22:34 -0700 (PDT)


Message-Id: <199607101922.MAA27470@web1.calweb.com>
Subject: Re: Parsing methods (fwd)
To: www-html@w3.org
Date: Wed, 10 Jul 1996 12:22:34 -0700 (PDT)
From: "Lee Daniel Crocker" <lcrocker@calweb.com>

> Well, a simple algorithm to do this: Once you have found a "<"
> character, the name of the element is everything up to the first whitespace
> character or the ">" character. If you hit whitespace, you've got
> attributes coming.

Simple, but not quite correct.  Don't forget that you have to
check for <!, and attribute names have a very limited character
set-- <tag-name2> is a tag, but <fake,tag*> is not, and should
just be printed as plain text.  And I won't even mention <B/bold/.

> If in whitespace, scroll forward until you see non-whitespace. Everything
> up to the "=" character, or whitespace or ">", is the attribute name. If
> you hit whitespace or ">", it doesn't have a value. You should then assume
> the value is the same as the attribute's name.

Nope.  If you see an attribute without an "=", that is a _value_,
not a name, and it's name is whatever attribute can legally take
that value.  Thus, <dl compact> is equivalent to
<dl compact="compact"> (same result, different rule), but
<img left> is a perfectly legal variant of <img align="left">.
Very few browsers get this correct.

Check out a good SGML reference.  Never rely on browsers--
especially ones as broken as Netscape--to tell you what valid
HTML looks like.