Re: SPACE...

On Fri, 8 Nov 1996 00:24:37 -0500 Gavin Nicol said:
>
>I noticed in the XML WD that there is no allowance for zenkaku
>spaces etc. in markup. This should be fixed.

Thanks; it will be.  All the characters identified by Unicode as
having the property of being white space will be defined as part of
XML's non-terminal S, and thus will be treated as separators within
markup; we just haven't got around to transcribing the relevant
character codes into the non-terminal yet.

I had been thinking that all such separators except NBSP should also
be subject to XML's white-space normalization rules (reduction of any
span of white-space characters to a single SPACE -- i.e. U+0020 --
unless white-space preservation is turned on), but I am now wondering
if that is the right behavior.  I have no experience using these
variant kinds of spaces.  Presumably these are distinct from SPACE
because they have some desirable property (such as marking 'word'
boundaries not otherwise detectable by software) and I'm not sure
reducing them to SPACE is the correct behavior.

I18nists, speak up now.  Reduce them all or not?  In favor:  it's
simple behavior, it is easily understood, and all white space is
treated the same.  Against:  well, I don't know whether there is any
argument against, that's what I am asking:  *is* there an argument
against?  Is the distinction between SPACE, half-width space,
en-space, em-space, double-width space, zenkaku space, etc., to be
preserved in a way that the distinction between SPACE and TAB is not
preserved?  Should some white space characters be leveled and others
not?  We were not planning to level NBSP, since one common use
for it is to try to prevent such white-space normalization in
specific cases; should other 10646 white-space characters also be
exempt?  Which?


-C. M. Sperberg-McQueen

Received on Friday, 8 November 1996 07:53:44 UTC