- From: Randy Waki <rwaki@sunscreen.whizbang.com>
- Date: Sat, 20 Nov 1999 17:10:20 -0700
- To: "Dave Raggett" <dsr@w3.org>
- Cc: <html-tidy@w3.org>
On Sat, 20 Nov 1999, Dave Raggett wrote:
> SGML/XML says:
>
> good <!---->
> bad <!----->
> bad <!------>
> bad <!------->
> good <!-------->
>
> weird isn't it!
>
> I will adjust the parser to trim trailing hyphens to the
> nearest legal number.
I believe this would be insufficient for XML. XML's comment syntax is a
subset of SGML/HTML's. Production 15 in XML 1.0 says:
Comment ::= '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'
and the text says:
For compatibility, the string "--" (double-hyphen) must not occur
within comments.
This means that the characters between the opening <!-- and the closing -->
cannot contain two consecutive hyphens. Also they cannot end in a hyphen (as
per the BNF even though the text fails to mention it).
So for XML (as opposed to SGML/HTML):
<!----> good (empty comment)
<!-----> bad (trailing hyphen)
<!------> bad (consecutive hyphens, trailing hyphen)
<!-------> bad (consecutive hyphens, trailing hyphen)
<!--------> bad (consecutive hyphens, trailing hyphen)
<!--- --> good
<!-- - - --> good
For XML, Tidy could fix consecutive hyphens by examining the characters
between the <!-- and the --> and replacing the first, third, etc. hyphen
with a space and also replacing any trailing hyphen with a space. This
should preserve much of the visual effect intended by people who use
consecutive hyphens as dividers.
If you wanted to avoid a special case for XML, perhaps Tidy could make all
comments conform to XML's stricter syntax. (The extra latitude allowed by
SGML/HTML is small enough and obscure enough that I wonder if anyone would
miss it.)
Randy
Received on Saturday, 20 November 1999 19:12:15 UTC