Re: SPACE... from Rick Jelliffe on 1996-11-11 (w3c-sgml-wg@w3.org from November 1996)

From: Rick Jelliffe <ricko@allette.com.au>
Date: Mon, 11 Nov 1996 22:39:36 +1100
To: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
CC: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Message-ID: <32871078.1CA3@allette.com.au>

Michael Sperberg-McQueen wrote:

> I18nists, speak up now.  Reduce them all or not?  In favor:  it's
> simple behavior, it is easily understood, and all white space is
> treated the same.  Against:  well, I don't know whether there is any
> argument against, that's what I am asking:  *is* there an argument
> against?  Is the distinction between SPACE, half-width space,
> en-space, em-space, double-width space, zenkaku space, etc., to be
> preserved in a way that the distinction between SPACE and TAB is not
> preserved?  

In markup, I think that if something looks like whitespace, 
it should act like whitespace and be treated as 
a SEPCHAR. However, I don't think using mad typographical spaces 
it is good markup practise that should
be encouraged, so I would be equally happy if just SP and full-width
space 
are treated as SEPCHARs, with a recommendation that the other kinds of 
spaces be allowed as SEPCHARs in the guise of error-recovery only.

What about NBSP? It is tough, but I don't really think it should be
exempt: it looks like a space, and it should act like one. This means
that languages that use NBSP in words will have to use something else,
but there are plenty of things like '-.\'~' for them to use, just
like we have to now when we want to construct an identifier 
from several words: to make it not a SEPCHAR could make markup 
visually confusing. 

In data, I think if someone uses a space character
other than the standard ISO 10646 whitespace, they should be free
to expect it to be passed to the application as is, otherwise why 
would they be using it?

So I think in ESIS data:
	(TAB SP LF CR)+ => SP

The other kinds of spaces, including the full-width (zenkaku) space,
should be left up to the application to deal with.  Actually, I am
not even really keen on reducing TABs; I would be happy with just
	(SP LF CR)+ => SP

The reduction should be done as a notional post-parse stage, so that
	'X' TAB LF '<!--zzz--!>' SP SP LF LF 'X'
becomes
	'X' SP 'X'.
I think this allows more SGML compatability than doing it as a
preprocess
on the entities, or whatever. 

Rick Jelliffe
Allette Systems

Received on Monday, 11 November 1996 06:36:29 UTC