- From: <bugzilla@jessica.w3.org>
- Date: Wed, 29 Sep 2010 09:37:20 +0000
- To: public-html@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10800 Summary: Reconsider form feed (U+000C) conformance Product: HTML WG Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: HTML5 spec (editor: Ian Hickson) AssignedTo: ian@hixie.ch ReportedBy: bugzilla@polizisten-duzer.de QAContact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org As currently drafted, HTML5 allows the form feed (U+000C) character * as syntactic whitespace * in content (text and attribute values) This is really an innovation of HTML5. HTML 2.0, 3.2, 4.0 and 4.01 all had SGML declarations that excluded the form feed (actually, all control characters except horizontal tab, line feed and carriage return) from the document character set <http://www.w3.org/TR/REC-html40/sgml/sgmldecl.html>, which means that in HTML 4.01, form feeds can only occur as character references, which means they aren't syntactic whitespace. HTML 4.01 also mentions the form feed character in a section that is about "printable" whitespace <http://www.w3.org/TR/REC-html40/struct/text.html#h-9.1>, but it's obscure, has not been implemented consistently by any browser, and defining the rendering is nowadays considered the job of CSS rather than HTML. Now HTML5 allows the form feed as syntactic whitespace. This is rather harmless, but not particularly useful either. What is more harmful is that HTML5 also allows form feeds in content. So: * While HTML 4.01 allowed all control characters in content (if written as character references), HTML5 rules them out completely (even as character references) except for the form feed character (which is now allowed even in raw form). => Not consistent with anything known. * XML 1.0 does not allow form feeds in any way. => Results in a class of conforming HTML5 documents that can't be expressed in XML 1.0 and could be avoided rather easily (more easily than the other such cases). * No browser currently implements the rendering of the form feed character in a useful way. Internet Explorer and Opera render it as a collapsing space with 'white-space: normal', but as a box with 'white-space: pre'. Gecko and Webkit always render it as a non-collapsing zero-width glyph; the CSS 'white-space' property makes no difference (and they don't regard it as "printable" whitespace at all; this can be seen when searching for 'word1 word2' in a document that contains 'word1word2'). * CSS 2.1 does not consider the form feed character to be "printable" whitespace. It says "Control characters other than U+0009 (tab), U+000A (line feed), U+0020 (space), and U+202x (bidi formatting characters) are treated as characters to render in the same way as any normal character" <http://www.w3.org/TR/CSS21/text.html#ctrlchars>. (The grammar of CSS 2.1 does consider the form feed character to be syntactic whitespace, but this is not helpful for the rendering part.) In order to prevent another "single quirk" story where implementors waste more time than they already did (in the past <https://bugzilla.mozilla.org/show_bug.cgi?id=373268> and <https://bugzilla.mozilla.org/show_bug.cgi?id=437915> and in the future maybe <https://bugs.webkit.org/show_bug.cgi?id=13159>) on a character that has no agreed semantics in any markup language, and in order to prevent authors from expecting anything useful from it, I'm kindly asking for one of the following: * Do what XML 1.0 does, i.e., disallow the form feed character entirely. (If the treatment as syntactic whitespace is required for compatibility with legacy content, it can become part of the error handling.) * Revert to what HTML 4.01 did, i.e., allow the form feed character as character references only so nobody thinks it were whitespace. This is what XML 1.1 does, too. (I would not recommend this because it can't be extended to all control characters - certainly not the C1 controls since they need to be treated as Windows-1252 codepoints for compatibility - but still better than the raw character. And again: If necessary for compatibility, it can be treated as syntactic whitespace as part of the error handling.) -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug.
Received on Wednesday, 29 September 2010 09:37:22 UTC