W3C home > Mailing lists > Public > www-international@w3.org > October to December 2008

Re: Ideographic Space, word-spacing, and justification

From: Steve Deach <sdeach@adobe.com>
Date: Fri, 31 Oct 2008 12:04:25 -0700
To: Martin Duerst <duerst@it.aoyama.ac.jp>, "KOBAYASHI Tatsuo(FAMILY Given)" <tlk@kobysh.com>
CC: fantasai <fantasai.lists@inkedblade.net>, WWW International <www-international@w3.org>, Paul Nelson <paulnel@winse.microsoft.com>, Michel Suignard <michel@unicode.org>
Message-ID: <FE2V8zDv60gxapzhad200000094@fe2.corp.adobe.com>

Every few years this issues comes back up. Unfortunately, I can't find the
rather long treatise I wrote the last time.

In general, I agree with Martin, that one should use styling properties as a
replacement for most of the "layout" uses of space characters (just as one
should use tables in place of most uses of tabs). That said, I would like to
briefly summarize the traditional (pre-DTP) handling of spaces and spacing,
and comment on "what I believe" to be the correct handling.

Second, I agree that the handling of letterspacing and wordspacing varies by
script and in some cases usage within a script, due to historic/cultural
differences in preferences/aesthetics, or specific readability requirements
for the usage, and the aesthetic desires of the designer.



This is a partial reconstruction of my prior emails on this topic.

My terminology:
  "Spacing" an adjustment to the distance between 2 glyphs/characters.
  "Space" a character which has a width but no visible inked representation.
  "Letterspacing" an adjustment to the intercharacter spacing used for
     line justification. [This definintion differs from CSS's.]
  "Wordspacing" an adjustment to the width of an interword space, also
     used for line justification.
  "WhiteSpaceAddition/Reduction (WAS/WSR)" a uniform adjustment to
     intercharacter spacing that is applied for design purposes or
     emphasis. [This corresponds most closely to the CSS-2.0 definition
     of letterspacing. Most DTP applications call this "Tracking".]
  "Tracking" and adjustment to intercharacter spacing which varies by
     fontsize/pointsize that is used to increase readability when
     optical sizing is not provided by the font. [This traditional
     definition differs from that used in most DTP applications.]


In setting Roman text:
  Letterspacing is not generally applied to Arabic (and other
connected-letter scripts/languages, nor to connected letter ("script") faces
in Roman-derivative scripts)
  Letterspacing is not generally applied to ideographic or similar
monospaced scripts, nor to monospaced text in Roman-derivative environments.
  Traditional applications varied widely in the algorithms used for
weighting how much of a justification adjustment was applied to wordspacing
vs to letterspacing. Most modern systems treat them as linear-proportional.
  Traditional publishing applications were also at odds over whether the
letterspacing adjustment AND the wordspacing adjustment should both be
applied to the space/NbSp characters, but most modern systems apply both.
  The Unicode NbSp (u+00a0) character should be treated the same as the
Unicode Space (u+0020). [In traditional publishing systems, these are
variable width in justified lines and fixed width in "aligned", tabular, and
math uses. However, some traditional publishing systems treat all space
characters prior to the first non-space in a line as fixed width.]
  The FigureSpace (u+2007), and PunctuationSpace (u+2008) are treated the
same way the corresponding figure '0' and punctuation period/full stop would
be treated in the current layout context (justified vs
aligned/tabular/math).
  Some traditional publishing systems had a quad-space and a
justifying-space (sometimes called a 'spaceband' rather than 'justifying
space'). Use of the quad-space within justified text would force the fixed
nominal-width of the normal interword space character, disabling
justification adjustments. This encoding concept has no analogy in Unicode.
  All other space characters {EM-space, EN, EM-quad, EN-quad, 3/EM, 4/EM,
6/EM, Thin, & Hair} are treated as fixed width and are not adjusted for
letterspacing nor for wordspacing. (Traditional publishing systems used
these for alignment/layout and did not generally apply tracking nor WSA/WSR
either.)

Ideographic languages/scripts do not generally use wordspacing or
letterspacing to adjust justification; instead they typically use rules akin
to those described in JIS-4051 (latest). This algorithm involves trimming
some characters to half-width, then reinserting 1/2 & 1/4-em spacing
adjustments at selected points within the line.
  Under these rules, Ideographic-space is treated as an ideographic letter
[generally fixed-fullwidth, but has some specific additional rules], and not
as a roman variable space.
  It should be a styling option of whether Roman text embedded in
Ideographic text is set using Roman algorithms or Japanese/Chinese
algorithms. Depending on the publication and the publisher, Roman text may
be set proportional (using Roman or Asian justification rules), halfwidth,
or fullwidth. (Similarly, they may choose Asian or Roman word-breaking and
hyphenation rules.)

I have not covered any specifics in the handling of ancient languages that
are generally only of academic interest; nor the handling of Arabic and
Arabic-dervative scripts; nor Indic; nor certain other language-specific
differences (such as adjustments to spaces on sentence boundaries in some
uses not after certain punctuation characters in French and other
languages).

I have also not addressed the handling of "hanging punctuation" and "hanging
spaces"; though there are different philosophies/algorithm for handling
these across the various script families.

-- S.Deach
   sdeach@adobe.com






On 2008.10.31 02:43, "Martin Duerst" <duerst@it.aoyama.ac.jp> wrote:

> Hello everybody,
> 
> Just a bit of a wider background on full-width space.
> 
> It should be remembered that in contrast to the usual space (U+0020),
> which occurs all over the place in texts in most languages, the
> full-width space doesn't occur AT ALL in typical Japanese (or Chinese)
> texts. That's why it also barely occurs in the document written
> by the Japanese Layout TF, as well as in JIS 4501.
> 
> The full-width space is more used for layout than inside the actual
> text. In this respect, what CSS should do is to mainly look at
> Japanese typography and try to come up with properties that allow
> to get rid of full-width spaces in the text, rather than spending
> too much time on how to treat full-width space.
> 
> As a typical example, I guess lead typesetting and also definitely
> simple approaches to typesetting on the computer, such as plain
> text or old "word-processors" (which were not very much above
> plain text in their capabilities) use a full-width space to produce
> a start-of-paragraph indent (which is very often one full-width
> character wide). CSS should make sure that there is no need to
> insert such full-width spaces, because an exact one-full-width-
> character start-of-paragraph indent can be produced with an
> appropriate CSS property setting.
> 
> Another typical use of full-width space was to center text,
> and to insert spaces into text for headlines (to a large
> extent a crude backup for increasing text size, which wasn't
> possible when technology was limited to one or two bit-mapped
> font sizes. In this case, inter-character spacing property(/ies)
> may be important for 'facsimile' layouts, but with modern
> technology, such layout isn't much used anymore anyway.
> 
> Regards,   Martin.
> 
> At 18:31 08/10/30, KOBAYASHI Tatsuo(FAMILY Given) wrote:
>> Hi, Erica,
>> 
>> In Japanese Layout, "spacing issue" is one of the most difficult issues to
>> treat.
>> We intended to carefully eliminate concrete character name like IDEOGRAPHIC
>> SPACE(U+3000) and SPACE(U+0002) from our requirement. Rather introduced three
>> different types of abstract space concepts as follows:
>> 
>> inter character space: usulal 1/2 em fixed space.
>> conditional space: 1/2 em fixed space to be inserted or pulled off between
>> characters and punctuation marks.
>> adjustable space: variable width space, behaves like usual western variable
>> space.
>> 
>> Note that, usual Japanese punctuation marks have 1/2 em width in our
>> requirement, even if the character name might include "FULLWIDTH ~~~"
>> 
>> Anyway, the disition how to deal with these spaces in CSS recommendation and
>> in actual implementation is up to your side:-)
>> 
>> regards,
>> Tatsuo
>> 
>> 2008/10/30 Steve Deach <<mailto:sdeach@adobe.com>sdeach@adobe.com>
>>> 
>>> No, in my personal opinion, it should not.
>>> The 2 differences between normal space/nbsp vs ideographic space are:
>>> 1.) The normal width is different, and
>>> 2.) The normal space/nbsp is treated as justifying
>>>     (adjusted by both wordspacing and letterspacing),
>>>     whereas the Ideographic space should only be adjusted by
>>>     letterspacing (only if ideographic letters are also so adjusted).
>>> 
>>> However, I will re-confirm this with our CJK experts, before claiming this
>>> is an Adobe opinion.
>>> 
>>> 
>>> 
>>> On 2008.10.29 15:13, "fantasai"
>>> <<mailto:fantasai.lists@inkedblade.net>fantasai.lists@inkedblade.net> wrote:
>>> 
>>>> 
>>>> Hello,
>>>> 
>>>> The CSSWG would like to know whether the IDEOGRAPHIC SPACE U+3000
>>>> should be affected by 'word-spacing', and whether it should be
>>>> treated as a space during spaces-only justification or treated as
>>>> a typical ideographic punctuation character.
>>>> 
>>>> ~fantasai
>>>> 
>>> 
>>> 
>> 
>> 
>> 
>> -- 
>> KOBAYASHI Tatsuo
>> Scholex Co., Ltd. Yokohama
>> JUSTSYSTEM Digital Culture Research Center
> 
> 
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
> 
Received on Friday, 31 October 2008 19:05:32 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:18 GMT