- From: Masayasu Ishikawa <mimasa@w3.org>
- Date: Thu, 06 Nov 1997 20:41:51 +0900
- To: www-html-editor@w3.org
- Cc: connolly@w3.org
I know we're about to release HTML4 as Proposed Recommendation, but this point should be clarified before PR release. As some discussions are in progress at w3c-xml-sig, there is an inconsistency between HTML and XML about the definition of white space. In WD-html40-971024/struct/text.html#h-10.1, white space is defined as follows: >10.1 White space > > The document character set includes a wide variety of white space > characters. Many of these are typographic elements used in some > applications to produce particular visual spacing effects. HTML > considers only the following characters to be white space characters: > > * ASCII space ( ) > * ASCII tab (	) > * ASCII form feed () > * Zero-width space (	) > * CJK ideographic space ( ) I think this sentence is included since WD-html40-971017, in response to John D. Burger's comments, at <URL:http://lists.w3.org/Archives/Member/w3c-html-wg/1997OctDec/0117.html>. But including ideographic space to white spaces clearly conflicts with XML's definition of white space. In XML WG decisions of 27 August 1997, which is found at <URL:http://lists.w3.org/Archives/Member/w3c-xml-sig/1997Aug/0371.html>, clearly says: >6. Ideographic space is not white space. > >Decision (unanimous): ideographic space (#x3000) will be removed from >the non-terminals S and PubidCharacter. > >Rationale: Ideographic space corresponds more closely to the >no-break space (#xA0, ) than to the standard space character >(#x20). #xA0 is not allowed in S, and neither should ideographic >space be. It is unlikely, with current standard input methods for >kanji, that any operator would unintentionally or accidentally insert an >ideographic (#x3000) rather than a Latin (#x20) space within a tag. I stand for XML's decision. If we leave this inconsistency, it will cause a lot of confusion, for example, when converting XML document to HTML, ideographic space will suddenly disappear! I strongly believe that HTML and XML should be consistent at this point. And, while I disagree to include ideographic space to white spaces, if you decided to include it, it should be reflected to SGML declaration. Current SGML declaratation doesn't state that ideographic space is white space. -- Masayasu Ishikawa / mimasa@w3.org W3C - World Wide Web Consortium
Received on Thursday, 6 November 1997 06:42:16 UTC