- From: Arnoud <galactus@stack.urc.tue.nl>
- Date: Wed, 10 Jul 1996 20:11:04 +0200
- To: www-html@w3.org
-----BEGIN PGP SIGNED MESSAGE----- In article <v0300780eae0923bca181@[205.149.180.135]>, Walter Ian Kaye <boo@best.com> wrote: > straightforward -- what I'm looking for is how to parse the contents of a > tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>. Well, a simple algorithm to do this: Once you have found a "<" character, the name of the element is everything up to the first whitespace character or the ">" character. If you hit whitespace, you've got attributes coming. If in whitespace, scroll forward until you see non-whitespace. Everything up to the "=" character, or whitespace or ">", is the attribute name. If you hit whitespace or ">", it doesn't have a value. You should then assume the value is the same as the attribute's name. If you hit "=", and the next character is " or ', everything until the next corresponding " or ' is the attribute's value. If not, everything until the next whitespace character or ">" is the value. If you hit ">", you have reached the end of the tag. Note that simply scanning forward for ">" does NOT give you the end of the tag. This is what NS 1.x did. But look at this tag: <IMG ALT=" => " SRC=next.gif> If your parser can correctly parse this, it's a good one. :-) Galactus - -- To find out more about PGP, send mail with HELP PGP in the SUBJECT line to me. E-mail: galactus@stack.urc.tue.nl - Please PGP encrypt your mail if you can. Finger galactus@turtle.stack.urc.tue.nl for public key (key ID 0x416A1A35). Anonymity and privacy site: <http://www.stack.urc.tue.nl/~galactus/remailers/> -----BEGIN PGP SIGNATURE----- Version: 2.6.3i Charset: cp850 iQCVAgUBMeP4DjyeOyxBaho1AQEyNQP/e6zta748oAMtsic76sLiEaVm+cHmIgBh LaQjaiyUXRbOwuTHk1U1GuESLco98P48C3qy+FB9MFXA0J0N/hSmV5/n9PM+zgHe 2u5lEVhBftc/llyxzFjGAuSSbmA6PjQRxMZkSIck9fEQ2mvGFZCcm7tgBskVEeDd o9wNKIyj58U= =Z1Mg -----END PGP SIGNATURE-----
Received on Wednesday, 10 July 1996 14:40:40 UTC