Re: Parsing methods


In article <v0300780eae0923bca181@[]>,
Walter Ian Kaye <> wrote:
> straightforward -- what I'm looking for is how to parse the contents of a
> tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>.

Well, a simple algorithm to do this: Once you have found a "<"
character, the name of the element is everything up to the first whitespace
character or the ">" character. If you hit whitespace, you've got
attributes coming.
If in whitespace, scroll forward until you see non-whitespace. Everything
up to the "=" character, or whitespace or ">", is the attribute name. If
you hit whitespace or ">", it doesn't have a value. You should then assume
the value is the same as the attribute's name.

If you hit "=", and the next character is " or ', everything until the
next corresponding " or ' is the attribute's value. If not, everything
until the next whitespace character or ">" is the value.

If you hit ">", you have reached the end of the tag.

Note that simply scanning forward for ">" does NOT give you the end
of the tag. This is what NS 1.x did. But look at this tag:

  <IMG ALT=" => " SRC=next.gif>

If your parser can correctly parse this, it's a good one. :-)


- -- 
To find out more about PGP, send mail with HELP PGP in the SUBJECT line to me.
E-mail: - Please PGP encrypt your mail if you can.
Finger for public key (key ID 0x416A1A35).
Anonymity and privacy site: <>

Version: 2.6.3i
Charset: cp850


Received on Wednesday, 10 July 1996 14:40:40 UTC