Re: Parsing methods

Daniel W. Connolly (connolly@beach.w3.org)
Wed, 10 Jul 1996 17:40:04 -0400


Message-Id: <m0ue6zM-0002URC@beach.w3.org>
To: galactus@stack.urc.tue.nl (Arnoud "Galactus" Engelfriet)
cc: www-html@w3.org
Subject: Re: Parsing methods 
In-reply-to: Your message of "Wed, 10 Jul 1996 20:11:04 +0200."
             <4I/4x4uYOdXZ089yn@stack.urc.tue.nl> 
Date: Wed, 10 Jul 1996 17:40:04 -0400
From: "Daniel W. Connolly" <connolly@beach.w3.org>

In message <4I/4x4uYOdXZ089yn@stack.urc.tue.nl>, Arnoud "Galactus" Engelfriet w
rites:
>-----BEGIN PGP SIGNED MESSAGE-----
>
>In article <v0300780eae0923bca181@[205.149.180.135]>,
>Walter Ian Kaye <boo@best.com> wrote:
>> straightforward -- what I'm looking for is how to parse the contents of a
>> tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>.
>
>Well, a simple algorithm to do this: Once you have found a "<"
>character, the name of the element is everything up to the first whitespace
>character or the ">" character.

I'm afraid this isn't quite the case.

For example:

===================
http://www.w3.org/pub/WWW/TR/WD-sgml-lex/

The following examples contain no markup. They illustrate that a the <
and </ strings do not always signal markup.

< x > <324 </234>
<==> < b>
<%%%> <---> <...> <--->
===================

Dan