Re: Parsing methods -Reply

Jim Taylor (
Wed, 10 Jul 1996 12:44:06 -0800

Message-Id: <>
Date: Wed, 10 Jul 1996 12:44:06 -0800
From: Jim Taylor <>
Subject: Re: Parsing methods -Reply

>>> Arnoud "Galactus" Engelfriet <> 07/10/96
10:41am >>>
>In article <v0300780eae0923bca181@[]>,
>Walter Ian Kaye <> wrote:
> straightforward -- what I'm looking for is how to parse the contents of
> tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>.

>Well, a simple algorithm to do this: Once you have found a "<"
>character, the name of the element is everything up to the first
>character or the ">" character. If you hit whitespace, you've got
attributes coming.

Close but no cigar. Element names must begin with a letter and be
followed by letters, digits, periods, or hyphens. Just looking for
whitespace is a bad thing. In other words, if I have text that reads "3<4
but 4>2" the parser should pass it though unmodified, because "4" is not
a valid element name.

Also, information inside the <> is "parsed character data," meaning all
character references ("&#34;", "&iacute;", etc.) should be decoded. For
example, a tag such as <ELEMENT attr1=&#97;&#98;&#99;> is equivalent
to <ELEMENT attr1=abc>.

There are other things to watch out for. It's not as "straightforward" as
you might hope, and probably no browser other than arena/amaya does
it all right.

Jim "The Frog" Taylor, Director of Information Technology
Videodiscovery, Inc. - Multimedia Education for Science and Math
Seattle, WA, 206-285-5400 <>