- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- Date: Fri, 08 Nov 96 06:39:31 CST
- To: W3C SGML Working Group <w3c-sgml-wg@w3.org>
On Fri, 8 Nov 1996 00:24:37 -0500 Gavin Nicol said: > >I noticed in the XML WD that there is no allowance for zenkaku >spaces etc. in markup. This should be fixed. Thanks; it will be. All the characters identified by Unicode as having the property of being white space will be defined as part of XML's non-terminal S, and thus will be treated as separators within markup; we just haven't got around to transcribing the relevant character codes into the non-terminal yet. I had been thinking that all such separators except NBSP should also be subject to XML's white-space normalization rules (reduction of any span of white-space characters to a single SPACE -- i.e. U+0020 -- unless white-space preservation is turned on), but I am now wondering if that is the right behavior. I have no experience using these variant kinds of spaces. Presumably these are distinct from SPACE because they have some desirable property (such as marking 'word' boundaries not otherwise detectable by software) and I'm not sure reducing them to SPACE is the correct behavior. I18nists, speak up now. Reduce them all or not? In favor: it's simple behavior, it is easily understood, and all white space is treated the same. Against: well, I don't know whether there is any argument against, that's what I am asking: *is* there an argument against? Is the distinction between SPACE, half-width space, en-space, em-space, double-width space, zenkaku space, etc., to be preserved in a way that the distinction between SPACE and TAB is not preserved? Should some white space characters be leveled and others not? We were not planning to level NBSP, since one common use for it is to try to prevent such white-space normalization in specific cases; should other 10646 white-space characters also be exempt? Which? -C. M. Sperberg-McQueen
Received on Friday, 8 November 1996 07:53:44 UTC