Ideographic space should not be white space!

I know we're about to release HTML4 as Proposed Recommendation, but
this point should be clarified before PR release.

As some discussions are in progress at w3c-xml-sig, there is an
inconsistency between HTML and XML about the definition of white space.

In WD-html40-971024/struct/text.html#h-10.1, white space is defined as
follows:

>10.1 White space
>
>   The document character set includes a wide variety of white space
>   characters. Many of these are typographic elements used in some
>   applications to produce particular visual spacing effects. HTML
>   considers only the following characters to be white space characters:
>
>     * ASCII space ( )
>     * ASCII tab (	)
>     * ASCII form feed ()
>     * Zero-width space (	)
>     * CJK ideographic space ( )

I think this sentence is included since WD-html40-971017, in response to
John D. Burger's comments, at
<URL:http://lists.w3.org/Archives/Member/w3c-html-wg/1997OctDec/0117.html>.

But including ideographic space to white spaces clearly conflicts with
XML's definition of white space.  In XML WG decisions of 27 August 1997,
which is found at
<URL:http://lists.w3.org/Archives/Member/w3c-xml-sig/1997Aug/0371.html>,
clearly says:

>6.  Ideographic space is not white space.
>
>Decision (unanimous): ideographic space (#x3000) will be removed from
>the non-terminals S and PubidCharacter.
>
>Rationale:  Ideographic space corresponds more closely to the
>no-break space (#xA0, &nbsp;) than to the standard space character
>(#x20).  #xA0 is not allowed in S, and neither should ideographic
>space be.  It is unlikely, with current standard input methods for
>kanji, that any operator would unintentionally or accidentally insert an
>ideographic (#x3000) rather than a Latin (#x20) space within a tag.

I stand for XML's decision. If we leave this inconsistency, it will
cause a lot of confusion, for example, when converting XML document to
HTML, ideographic space will suddenly disappear!  I strongly believe
that HTML and XML should be consistent at this point.

And, while I disagree to include ideographic space to white spaces,
if you decided to include it, it should be reflected to SGML
declaration.  Current SGML declaratation doesn't state that
ideographic space is white space.

-- 
Masayasu Ishikawa / mimasa@w3.org
W3C - World Wide Web Consortium

Received on Thursday, 6 November 1997 06:42:16 UTC