- From: Rick Jelliffe <ricko@allette.com.au>
- Date: Mon, 11 Nov 1996 22:39:36 +1100
- To: Michael Sperberg-McQueen <U35395@UICVM.UIC.EDU>
- CC: W3C SGML Working Group <w3c-sgml-wg@w3.org>
Michael Sperberg-McQueen wrote: > I18nists, speak up now. Reduce them all or not? In favor: it's > simple behavior, it is easily understood, and all white space is > treated the same. Against: well, I don't know whether there is any > argument against, that's what I am asking: *is* there an argument > against? Is the distinction between SPACE, half-width space, > en-space, em-space, double-width space, zenkaku space, etc., to be > preserved in a way that the distinction between SPACE and TAB is not > preserved? In markup, I think that if something looks like whitespace, it should act like whitespace and be treated as a SEPCHAR. However, I don't think using mad typographical spaces it is good markup practise that should be encouraged, so I would be equally happy if just SP and full-width space are treated as SEPCHARs, with a recommendation that the other kinds of spaces be allowed as SEPCHARs in the guise of error-recovery only. What about NBSP? It is tough, but I don't really think it should be exempt: it looks like a space, and it should act like one. This means that languages that use NBSP in words will have to use something else, but there are plenty of things like '-.\'~' for them to use, just like we have to now when we want to construct an identifier from several words: to make it not a SEPCHAR could make markup visually confusing. In data, I think if someone uses a space character other than the standard ISO 10646 whitespace, they should be free to expect it to be passed to the application as is, otherwise why would they be using it? So I think in ESIS data: (TAB SP LF CR)+ => SP The other kinds of spaces, including the full-width (zenkaku) space, should be left up to the application to deal with. Actually, I am not even really keen on reducing TABs; I would be happy with just (SP LF CR)+ => SP The reduction should be done as a notional post-parse stage, so that 'X' TAB LF '<!--zzz--!>' SP SP LF LF 'X' becomes 'X' SP 'X'. I think this allows more SGML compatability than doing it as a preprocess on the entities, or whatever. Rick Jelliffe Allette Systems
Received on Monday, 11 November 1996 06:36:29 UTC