- From: <S.N.Brodie@ecs.soton.ac.uk>
- Date: Thu, 19 Sep 1996 12:06:10 +0100 (BST)
- To: galactus@htmlhelp.com (Arnoud "Galactus" Engelfriet)
- Cc: www-talk@w3.org
Arnoud "Galactus" Engelfriet wrote: > > In article <828.9609180920@strachey.ecs.soton.ac.uk>, > S.N.Brodie@ecs.soton.ac.uk wrote: > > My impression is that this is not a correctly terminated comment, since > > it does not fit the strict definition given in RFC1866. However, > > that's irrelevant, as all 3 browsers I've tried it on accept it. > > I suppose these browsers simply consider "<!--" the comment starting > tag and "-->" the corresponding closing tag. My favourite way to > demonstrate that is the following *legal* comment: > > <!-- -- --> --> > > That's two comments, one of which only contains " " and one contains > "> ". Agreed. > Anyway, as far as I can see RFC 1866 does not discuss the "-" character > as last character before "--" explicitly. It only states (section 3.2.5) > > Each comment starts with `--' and includes > all text up to and including the next occurrence of `--'. > > I'm just confused if the sequence "---" _should_ be seen as "-" > followed by "--" or as "--" followed by "-". That is the problem. My (very recently modifed :-) parser accepts the following and displays "Body text." as it should: Body <!-- comment -- -- > shouldn't see this! --> text. Netscape gets itself into all kinds of a mess with this. It seems to be applying the "-->" terminates a comment and if we don't find one, go back to the first occurrence of a > For example: One <!-- hi -- -- > lo --> Two is displayed as "One Two". However: One <!-- hi -- -- > lo -- > Two is displayed as "One lo -- > Two" Anybody got any suggestions how --- should be parsed whilst parsing a comment structure? My inclination is to treat it differently depending on whether you are inside a comment or not, but it becomes a special case that way, since behaviour will have to change depending on the number of consecutive - characters. Obviously you have to keep track of whether you are "inside" a -- or not. Having seen <!-- parser goes into comment parsing mode, and sets a flag "in_double_dash" to true. Then it continues until it sees 2 or more consecutive - characters. It discards all of these characters and sets "in_double_dash" to false. A '>' is only accepted as terminator if in_double_dash is false. Upon seeing a -- whilst in_double_dash is zero, set in_double_dash to one. Whilst this has a set behaviour for 2+ consecutive - symbols, is it the desired behaviour? One of the places this is most likely to crop up (IMHO) is when reading inlined scripts. I have specifically added recognition for <script> and </script> to the parser so it can discard that stuff automatically without relying on the script being commented out. I discard it as I have no intention of porting Visual Basic or writing a Javascript interpreter. I just don't have the time. -- Stewart Brodie, Electronics & Computer Science, Southampton University. http://www.ecs.soton.ac.uk/~snb94r/ http://delenn.ecs.soton.ac.uk/
Received on Thursday, 19 September 1996 07:07:03 UTC