[Bug 1617] how are comments really parsed? from bugzilla@wiggum.w3.org on 2005-07-19 (public-qt-comments@w3.org from July 2005)

From: <bugzilla@wiggum.w3.org>
Date: Tue, 19 Jul 2005 21:09:42 +0000
To: public-qt-comments@w3.org
Cc:
Message-Id: <E1DuzLS-0008IL-DE@wiggum.w3.org>

http://www.w3.org/Bugs/Public/show_bug.cgi?id=1617


scott_boag@us.ibm.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED




------- Additional Comments From scott_boag@us.ibm.com  2005-07-19 21:09 -------
(In reply to comment #0)

I agree with your analysis.

Certainly the intent and specific decision of the working groups is that
comments be allowed in so-called long tokens.

I don't think the two-pass approach you suggested, if I understand it, works
very well, because you have to be aware of the context to recognize a comment...
for instance, the comment could occur in string or element content.  So you
would have to do at least a partial complete parse to remove the comments.

My current thinking is that we don't use the term "long token" at all, and 
specify <"aa" "bb"> to mean look-ahead, i.e. you only recognize "aa" if followed
by "bb".

I plan to be doing a lot of work on this in the next three weeks, so I'll follow
up this issue more after that, with a more concrete proposal.

-scott

Received on Tuesday, 19 July 2005 21:09:45 UTC