- From: Jukka K. Korpela <jkorpela@cs.tut.fi>
- Date: Thu, 13 May 2004 14:53:58 +0300 (EEST)
- To: www-html@w3.org
I just noticed an inconsistency in the HTML 4.01 specification. It's of little practical value, but I think it should still be fixed at least by adding a note into the "Errata". At http://www.w3.org/TR/html4/struct/text.html#whitespace ASCII form feed () is defined as a white space character. But at http://www.w3.org/TR/html4/sgml/sgmldecl.html the SGML declaration says: DESCSET 0 9 UNUSED 9 2 9 11 2 UNUSED which means that 12 (decimal), which means form feed, is UNUSED, i.e. prohibited. At least the W3C validator reports a form field as "non SGML character 12", which looks OK to me - preference should be given to the formalized specification in favor of prose text. Of course, the form feed would be useless even if it were allowed, since it would be equivalent to a space. But the contradiction should be removed by removing the form feed from the set of white space characters. This implies that the description of differences between HTML and XHTML could be simplified, by completely removing clause C.15 from Appendix C, http://www.w3.org/TR/html/#C_15 As far as I can see, all characters permitted in HTML are permitted in XML and XHTML as well. The converse is not true, and this raises another question: XML permits C1 Controls, and HTML 4 forbids them. Since they are hardly useful, and typically result from conversion errors or incorrectly specified encoding (e.g., serving windows-1252 encoded data as iso-8859-1), shouldn't XHTML have a separate rule that forbids them? This would not make it possible to detect the problem as an error in validation, but it would let other checkers report it objectively as an error. Or is there some imaginable use for C1 Controls in XHTML? (Note that C0 Controls except tab, CR and LF are forbidden.) -- Jukka "Yucca" Korpela, http://www.cs.tut.fi/~jkorpela/
Received on Thursday, 13 May 2004 07:54:42 UTC