Message-Id: <199607101922.MAA27470@web1.calweb.com> Subject: Re: Parsing methods (fwd) To: email@example.com Date: Wed, 10 Jul 1996 12:22:34 -0700 (PDT) From: "Lee Daniel Crocker" <firstname.lastname@example.org> > Well, a simple algorithm to do this: Once you have found a "<" > character, the name of the element is everything up to the first whitespace > character or the ">" character. If you hit whitespace, you've got > attributes coming. Simple, but not quite correct. Don't forget that you have to check for <!, and attribute names have a very limited character set-- <tag-name2> is a tag, but <fake,tag*> is not, and should just be printed as plain text. And I won't even mention <B/bold/. > If in whitespace, scroll forward until you see non-whitespace. Everything > up to the "=" character, or whitespace or ">", is the attribute name. If > you hit whitespace or ">", it doesn't have a value. You should then assume > the value is the same as the attribute's name. Nope. If you see an attribute without an "=", that is a _value_, not a name, and it's name is whatever attribute can legally take that value. Thus, <dl compact> is equivalent to <dl compact="compact"> (same result, different rule), but <img left> is a perfectly legal variant of <img align="left">. Very few browsers get this correct. Check out a good SGML reference. Never rely on browsers-- especially ones as broken as Netscape--to tell you what valid HTML looks like.