Re: Parsing methods -Reply

Erik Aronesty (
Wed, 10 Jul 1996 13:20:49 -0700

Message-ID: <>
From: Erik Aronesty <>
To: "'Jim Taylor'" <>
Cc: "''" <>
Subject: RE: Parsing methods -Reply
Date: Wed, 10 Jul 1996 13:20:49 -0700

the character entities should be handled at the "read next character" for the "tag parser" i wouldn't worry.
the letter thing i forgot about....but waiting until whitespace is
better than letting a % screw up the parse.

IE: should the parser see
	<hello%^ myname=foo>
as a TAG that was messed up........
as plain text?

i say as a messed up tag.....

>From: 	Jim Taylor[]
>Sent: 	Wednesday, July 10, 1996 4:44 PM
>Subject: 	Re: Parsing methods -Reply
>>>> Arnoud "Galactus" Engelfriet <> 07/10/96
>10:41am >>>
>>In article <v0300780eae0923bca181@[]>,
>>Walter Ian Kaye <> wrote:
>> straightforward -- what I'm looking for is how to parse the contents of
>> tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>.
>>Well, a simple algorithm to do this: Once you have found a "<"
>>character, the name of the element is everything up to the first
>>character or the ">" character. If you hit whitespace, you've got
>attributes coming.
>Close but no cigar. Element names must begin with a letter and be
>followed by letters, digits, periods, or hyphens. Just looking for
>whitespace is a bad thing. In other words, if I have text that reads
>but 4>2" the parser should pass it though unmodified, because "4" is
>a valid element name.
>Also, information inside the <> is "parsed character data," meaning all
>character references ("&#34;", "&iacute;", etc.) should be decoded. For
>example, a tag such as <ELEMENT attr1=&#97;&#98;&#99;> is equivalent
>to <ELEMENT attr1=abc>.
>There are other things to watch out for. It's not as "straightforward"
>you might hope, and probably no browser other than arena/amaya does
>it all right.
>Jim "The Frog" Taylor, Director of Information Technology
>Videodiscovery, Inc. - Multimedia Education for Science and Math
>Seattle, WA, 206-285-5400 <>