Re: Parsing methods

-----BEGIN PGP SIGNED MESSAGE-----

In article <v0300780eae0923bca181@[205.149.180.135]>,
Walter Ian Kaye <boo@best.com> wrote:
> straightforward -- what I'm looking for is how to parse the contents of a
> tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>.

Well, a simple algorithm to do this: Once you have found a "<"
character, the name of the element is everything up to the first whitespace
character or the ">" character. If you hit whitespace, you've got
attributes coming.
If in whitespace, scroll forward until you see non-whitespace. Everything
up to the "=" character, or whitespace or ">", is the attribute name. If
you hit whitespace or ">", it doesn't have a value. You should then assume
the value is the same as the attribute's name.

If you hit "=", and the next character is " or ', everything until the
next corresponding " or ' is the attribute's value. If not, everything
until the next whitespace character or ">" is the value.

If you hit ">", you have reached the end of the tag.

Note that simply scanning forward for ">" does NOT give you the end
of the tag. This is what NS 1.x did. But look at this tag:

  <IMG ALT=" => " SRC=next.gif>

If your parser can correctly parse this, it's a good one. :-)

Galactus

- -- 
To find out more about PGP, send mail with HELP PGP in the SUBJECT line to me.
E-mail: galactus@stack.urc.tue.nl - Please PGP encrypt your mail if you can.
Finger galactus@turtle.stack.urc.tue.nl for public key (key ID 0x416A1A35).
Anonymity and privacy site: <http://www.stack.urc.tue.nl/~galactus/remailers/>


-----BEGIN PGP SIGNATURE-----
Version: 2.6.3i
Charset: cp850

iQCVAgUBMeP4DjyeOyxBaho1AQEyNQP/e6zta748oAMtsic76sLiEaVm+cHmIgBh
LaQjaiyUXRbOwuTHk1U1GuESLco98P48C3qy+FB9MFXA0J0N/hSmV5/n9PM+zgHe
2u5lEVhBftc/llyxzFjGAuSSbmA6PjQRxMZkSIck9fEQ2mvGFZCcm7tgBskVEeDd
o9wNKIyj58U=
=Z1Mg
-----END PGP SIGNATURE-----

Received on Wednesday, 10 July 1996 14:40:40 UTC