[Bug 1617] how are comments really parsed?

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1617


scott_boag@us.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




------- Additional Comments From scott_boag@us.ibm.com  2005-07-19 21:09 -------
(In reply to comment #0)

I agree with your analysis.

Certainly the intent and specific decision of the working groups is that
comments be allowed in so-called long tokens.

I don't think the two-pass approach you suggested, if I understand it, works
very well, because you have to be aware of the context to recognize a comment...
for instance, the comment could occur in string or element content.  So you
would have to do at least a partial complete parse to remove the comments.

My current thinking is that we don't use the term "long token" at all, and 
specify <"aa" "bb"> to mean look-ahead, i.e. you only recognize "aa" if followed
by "bb".

I plan to be doing a lot of work on this in the next three weeks, so I'll follow
up this issue more after that, with a more concrete proposal.

-scott

Received on Tuesday, 19 July 2005 21:09:45 UTC