- From: Ian Hickson <ian@hixie.ch>
- Date: Fri, 7 Jan 2011 01:10:26 +0000 (UTC)
On Tue, 2 Nov 2010, Martin Janecke wrote: > > In 10.1.6 Comments the current HTML spec > http://www.whatwg.org/specs/web-apps/current-work/multipage/syntax.html#comments > says: > > > Following this sequence, the comment may have text, with the additional > > restriction that the text must not [...] contain two consecutive U+002D > > HYPHEN-MINUS characters (--) [...] > > Section 5 of RFC 3490 http://tools.ietf.org/html/rfc3490#section-5 > defines the ACE-prefix in Internationalized Domain Names to be "xn--", > i.e. always containing two consecutive hyphen-minus characters. > > This leads to the odd situation that correctly ASCII-compatible encoded > IDNs cannot be used in HTML comments. For example, the wide-spread habit > of commenting out parts of HTML code in web pages fails when the code > contains those otherwise valid URLs. This really happens in practice > when working with IDNs (my personal experience) and I assume this > incompatibility will cause a growing number of pages to be invalid in > future, as the number of used IDNs grows, which will happen for sure, as > ICANN has approved internationalized top level domain names this year. > > Can the problems be prevented? E.g. by making "xn--" and "XN--" valid in > comments? > > May it even be justified to make "--" valid in comments again? As I > understand > http://lists.whatwg.org/htdig.cgi/whatwg-whatwg.org/2006-May/006337.html > and following replies, "--" used to be valid earlier in the spec and was > then changed to make HTML more compatible with SGML, although HTML(5) is > explicitly not SGML anymore. Making "--" valid won't affect any > previously valid or invalid HTML page in any negative way, will it? The main reason, IIRC, that we have disallowed "--" in comments in text/html is that they are disallowed in XML, and to help authors catch cases where they are commenting out comments. The question, I guess, is which of the following do we think is more important: * Helping authors not write HTML markup that might be hard to convert to XML, and helping authors avoid nesting comments accidentally, by flagging "--" sequences in comments * Getting out of the way of authors who want to put "--" sequences in comments, e.g. because they use "--" as a long dash (as I do all the time!), or because they want to comment out punycoded URLs. Currently the spec assumes the former is more important. Personally, I think the latter is rather more useful, but then I use "--" as long dashes all the time! When this was last studied, the weight of argument was on the stricter "disallow --" side of things, presumably. I'm open to changing this back; does anyone else have an opinion on this? -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 6 January 2011 17:10:26 UTC