Michael Sperberg-McQueen wrote:
> I18nists, speak up now. Reduce them all or not? In favor: it's
> simple behavior, it is easily understood, and all white space is
> treated the same. Against: well, I don't know whether there is any
> argument against, that's what I am asking: *is* there an argument
> against? Is the distinction between SPACE, half-width space,
> en-space, em-space, double-width space, zenkaku space, etc., to be
> preserved in a way that the distinction between SPACE and TAB is not
In markup, I think that if something looks like whitespace,
it should act like whitespace and be treated as
a SEPCHAR. However, I don't think using mad typographical spaces
it is good markup practise that should
be encouraged, so I would be equally happy if just SP and full-width
are treated as SEPCHARs, with a recommendation that the other kinds of
spaces be allowed as SEPCHARs in the guise of error-recovery only.
What about NBSP? It is tough, but I don't really think it should be
exempt: it looks like a space, and it should act like one. This means
that languages that use NBSP in words will have to use something else,
but there are plenty of things like '-.\'~' for them to use, just
like we have to now when we want to construct an identifier
from several words: to make it not a SEPCHAR could make markup
In data, I think if someone uses a space character
other than the standard ISO 10646 whitespace, they should be free
to expect it to be passed to the application as is, otherwise why
would they be using it?
So I think in ESIS data:
(TAB SP LF CR)+ => SP
The other kinds of spaces, including the full-width (zenkaku) space,
should be left up to the application to deal with. Actually, I am
not even really keen on reducing TABs; I would be happy with just
(SP LF CR)+ => SP
The reduction should be done as a notional post-parse stage, so that
'X' TAB LF '<!--zzz--!>' SP SP LF LF 'X'
'X' SP 'X'.
I think this allows more SGML compatability than doing it as a
on the entities, or whatever.
- Re: SPACE...
- From: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>