- From: Jim Taylor <JHTaylor@videodiscovery.com>
- Date: Wed, 10 Jul 1996 12:44:06 -0800
- To: www-html@w3.org
>>> Arnoud "Galactus" Engelfriet <galactus@stack.urc.tue.nl> 07/10/96 10:41am >>> >In article <v0300780eae0923bca181@[205.149.180.135]>, >Walter Ian Kaye <boo@best.com> wrote: > straightforward -- what I'm looking for is how to parse the contents of a > tag: <ELEMENT attr1=abc attr2="def ghi" attr3="jkl" attr4=mno>. >Well, a simple algorithm to do this: Once you have found a "<" >character, the name of the element is everything up to the first whitespace >character or the ">" character. If you hit whitespace, you've got attributes coming. Close but no cigar. Element names must begin with a letter and be followed by letters, digits, periods, or hyphens. Just looking for whitespace is a bad thing. In other words, if I have text that reads "3<4 but 4>2" the parser should pass it though unmodified, because "4" is not a valid element name. Also, information inside the <> is "parsed character data," meaning all character references (""", "í", etc.) should be decoded. For example, a tag such as <ELEMENT attr1=abc> is equivalent to <ELEMENT attr1=abc>. There are other things to watch out for. It's not as "straightforward" as you might hope, and probably no browser other than arena/amaya does it all right. ______________________________________________ Jim "The Frog" Taylor, Director of Information Technology <mailto:jhtaylor@videodiscovery.com> Videodiscovery, Inc. - Multimedia Education for Science and Math Seattle, WA, 206-285-5400 <http://www.videodiscovery.com/vdyweb>
Received on Wednesday, 10 July 1996 15:40:31 UTC