- From: <bugzilla@jessica.w3.org>
- Date: Fri, 01 Oct 2010 03:30:25 +0000
- To: public-html-bugzilla@w3.org
http://www.w3.org/Bugs/Public/show_bug.cgi?id=10800 --- Comment #4 from bugzilla@polizisten-duzer.de 2010-10-01 03:30:23 UTC --- I think that the form feed character made its way into HTML by mistake rather than by intention. HTML 2.0 explicitly said "In SGML applications, the use of control characters is limited in order to maximize the chance of successful interchange over heterogeneous networks and operating systems. In the HTML document character set only three control characters are allowed: Horizontal Tab, Carriage Return, and Line Feed (code positions 9, 13, and 10)." In July 1997, a draft of HTML 4 <http://www.w3.org/TR/WD-html40-970708/struct/text.html> (the earliest that mentioned form feeds in any way) said: "In addition, for all elements except PRE, a sequence of contiguous white space characters such as spaces, horizontal tabs, form feeds and line breaks, should be replaced by a single word space. Since the notion of what word space is varies from script (written language) to script, user agents should collapse white space in script-sensitive ways. For example, in Latin scripts, a single word space is just a space (ASCII decimal 32), while in Thai it is a zero-width word separator." Note how bogus this is. It mentions form feeds in a "such as" phrase (not quite appropriate wording for a normative section) without adjusting the SGML declaration accordingly. It also mentions the zero-width word separator, which has a totally different context. It sounds more like a brainstorming about whitespace than like a specification. But *if* taken normatively, the IE/Opera rendering (where form feeds collapse with 'white-space: normal') is closer. The next draft from November 1997 <http://www.w3.org/TR/PR-html40-971107/struct/text.html#h-9.1> says: "HTML considers only the following characters to be white space characters: * ASCII space ( ) * ASCII tab (	) * ASCII form feed () * Zero-width space (	)" Note how it has managed to mix the form feed and the zero-width space, which were previously mentioned in totally different contexts, into one category and even get the code point of the zero-width space wrong. The coint point has been corrected shortly after, but the whole section has remained basically unchanged and obscure. The issue has been brought up more than once * http://lists.w3.org/Archives/Public/www-html-editor/1998JulSep/0131.html * http://lists.w3.org/Archives/Public/www-html/2004May/0022.html * http://bytes.com/topic/html-css/answers/169504-theory-question-u-000c-html-4-01-a but was never resolved in 13 years. On the contrary, it was propagated into other specifications. For some time, even XHTML 1 treated the form feed as whitespace <http://www.w3.org/TR/1999/PR-xhtml1-19991210/#uaconf> (fixed three years later). Therefore, I'd like to be 100% sure that the form feed isn't allowed in HTML5 just because of a 13 years old mistake. Besides, HTML5's treatment doesn't look consistent in itself. HTML5 rules out , presumably because that would give an actual carriage return in the DOM and CSS isn't prepared to handle that (CSS regards carriage returns as random control characters, not whitespace), and that is reasonable. But then, CSS isn't prepared to handle form feeds either. Is the ability to paste RFC text into HTML and still be conforming really a use case that justifies this? CSS has added the form feed around the same time, btw. (the last version without form feeds was <http://www.w3.org/TR/WD-CSS2-971104/grammar.html>, the first version with form feeds is <http://www.w3.org/TR/1998/WD-css2-19980128/grammar.html>), but that's rather harmless because a form feed in CSS doesn't get into the DOM. Class and [attr~=val] selectors constitute an intersection, however. (For these, it would IMHO make more sense if CSS followed the whitespace definition of the document language instead of its own, but it's not too important as long as the only character where it would make a difference were non-conforming.) One more bizzare thing: As said obove, IE collapses form feeds with 'white-space: normal' (matching the original HTML 4 draft), but renders them as boxes with 'white-space: pre' - unless they are preceded or followed by a vertical tab. '' gets rendered as '♂♀' and '' gets rendered as '♀♂'. '♂' and '♀' have code positions 11 and 12 in some DOS code pages. IE must be really desperate about making something printable of them. -- Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Friday, 1 October 2010 03:30:27 UTC