W3C home > Mailing lists > Public > html-tidy@w3.org > July to September 1999

RE: iso-2022 HTML -> XML

From: Randy Waki <rwaki144@sun10.whizbanglabs.com>
Date: Thu, 2 Sep 1999 10:55:43 -0600
To: <html-tidy@w3.org>
Message-ID: <000001bef563$fe5a97f0$ce9946a6@whizbanglabs.com>
Tomohisa Yazaki wrote:
>
> For the first problem, I added cleanComment() method to
> Clean.java. This method accepts a Node object and removes all comment
> nodes which include "--". To make this patch work, I fixed a bug that
> doctype.next.prev is not set in setXHTMLDocType() and fixDocType() in
> Lexer.java. Removing the whole comment is a rough way, replacing "--"
> to something else may be a better way.

How about inserting a space between consecutive hyphens?  In other words,
change all occurrences of "--" to "- -".  This is how XSLT fixes up illegal
comments [1].

Also, it's illegal for a comment to end with "-" in XML (it's buried in the
BNF [2]).  This may be illegal in HTML, too.  It depends on how you
interpret the HTML 4.0 spec [3].  Dave probably knows.  For example,

   <!-- Illegal in XML, maybe in HTML, too --->

Note the three hyphens at the end.

[1] http://www.w3.org/TR/WD-xslt#section-Creating-Comments
[2] http://www.w3.org/TR/1998/REC-xml-19980210.html#sec-comments
[3] http://www.w3.org/TR/REC-html40/intro/sgmltut.html#h-3.2.4

Randy
Received on Thursday, 2 September 1999 12:57:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 3 April 2012 06:13:42 GMT