W3C home > Mailing lists > Public > www-international@w3.org > January to March 2006

Re: East Asian Emphasis Marks (Japanese bouten, etc)

From: Felix Sasaki <fsasaki@w3.org>
Date: Wed, 15 Mar 2006 22:35:27 +0900
Message-ID: <4418181F.800@w3.org>
To: fantasai <fantasai.lists@inkedblade.net>
Cc: 'WWW International' <www-international@w3.org>
Hi,

Some mails were in this thread were sent by people who are not
subscribed to www-international. I collected them below, from the
archive of http://www.unicode.org/mail-arch/unicode-ml/ , FYI.

- Felix


------------------------------------
From: Doug Ewell (dewell@adelphia.net)
Date: Sun Mar 12 2006 - 17:33:18 CST
------------------------------------
fantasai <fantasai dot lists at inkedblade dot net> wrote:

> ... Kobayashi Tatsuo and I looked
> through the Unicode repetoire last week, and we found
> U+FE45 SESAME DOT
> U+FE46 WHITE SESAME DOT
> which covers only two of the shapes. Also, they are in the
> compatibility forms block, so their use is discouraged.

I might have missed something, but I thought I remembered Asmus and
others stating clearly that there was no correlation between the block
where a character resides and whether that character is discouraged.

--
Doug Ewell
Fullerton, California, USA
http://users.adelphia.net/~dewell/


------------------------------------
From: Ken Lunde (lunde@adobe.com)
Date: Mon Mar 13 2006 - 08:09:36 CST
------------------------------------
All,

In the Adobe-Japan1-x character collection, we map U+FE45 and U+FE46
to CIDs 12639 and 12640, respectively. In other words, we provide
separate glyphs than the punctuation marks they resemble. These
glyphs are intended for use as annotative marks, such as ruby. In
fact, the range of glyphs intended for annotative use are intended to
be scaled to a smaller size, typically 50% for text runs.

Regards...

-- Ken

On 2006/03/12, at 17:42, Martin Duerst wrote:

> At 07:50 06/03/13, fantasai wrote:
> >
> >I'm currently going through emphasis marks used in East Asian texts
> >to see what options we need to define in CSS. One of the questions I
> >have is, where do the glyphs come from? Kobayashi Tatsuo and I looked
> >through the Unicode repetoire last week, and we found
> > U+FE45 SESAME DOT
> > U+FE46 WHITE SESAME DOT
> >which covers only two of the shapes. Also, they are in the
> compatibility
> >forms block, so their use is discouraged.
> >
> >Paul Nelson says Microsoft uses fixed shapes for these emphasis
> marks.
> >
> >In the case of the sesame at least, the shape in printed materials
> closely
> >parallels U+3001 IDEOGRAPHIC COMMA, which is provided by the font.
>
> Yes, it indeed very much looks that way. Your examples show dots
> (above) in horizontal text, and commas (or comma-like shapes) (on the
> right) in vertical text, and some text in the scans actually say
> that this is current practice.
>
> >I would like to know, is there a way, should there be a way, for
> the font
> >in use to have some say over the glyph shape for emphasis marks?
>
> From a purely aesthetic point of view, I'd guess yes. For a very
> light font, smaller dots/commas may be more appropriate. For a very
> heavy font, bigger dots/commas may be more appropriate. There may also
> be issues with how far a way from the main line the marks go; for
> different fonts, the optically best distance may be different.
>
> But from a practical viewpoint, it is very well possible that such
> adjustments are not done.
>
> I suggest you look at some fonts, or contact some font providers
> (e.g. Adobe and others).
>
> Being able to specify a specific character as an emphasis mark
> sounds attractive, but it would bring up the need to specify
> several other parameters, such as scaling and offsets.
>
> Regards, Martin.
>
> >As for other shapes, I have scanned in a few examples:
> > http://fantasai.inkedblade.net/style/discuss/emphasis-marks/
> >I also remember a Tibetan book using x-shaped marks.
> >
> >Any comments on shapes, usage patterns, usefulness of various
> settings,
> >etc. would be much appreciated.
> >
> >~fantasai
> >
>


------------------------------------
From: Ken Lunde (lunde@adobe.com)
Date: Mon Mar 13 2006 - 18:11:08 CST
------------------------------------
Fantasai,

You wrote:

> Could you explain that in a little more detail, please?
>
> I take what you said to mean that
> - U+FE45 and U+FE46 have their own glyphs

Correct.

> - these glyphs are intended to be scaled down to 50% of the text
> size

Scaling to 50% size is one possible use. They can also be used at
their original size, of course.

> - therefore in the font they are approximately twice the size of the
> comma rather than approximately the same size

The comment above applies here.

> - therefore an application wishing to use U+FE45 as an emphasis mark
> should scale its glyph down by half before rendering it
> Is that correct?

It depends on whether it is to be used as an annotation, which would
lend itself to scaling, or inline as part of the text. At least in
Japanese, typical usage would be like ruby, meaning as annotations
above other characters.

We chose to implement our ruby glyphs, including these annotative
marks, in a way that requires applications to scale them to the
intended size. Why? Prior to having fonts with dedicated glyphs for
ruby (meaning the complete set of kana, plus a few more symbols),
applications were already scaling standard glyphs. We found it
practical to preserve that aspect of usage, and to design the glyphs
so that they would look appropriate when scaled.

Regards...

-- Ken


------------------------------------
From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 13 2006 - 17:02:39 CST
------------------------------------
> I'm currently going through emphasis marks used in East Asian texts
> to see what options we need to define in CSS. One of the questions I
> have is, where do the glyphs come from? Kobayashi Tatsuo and I looked
> through the Unicode repetoire last week, and we found
> U+FE45 SESAME DOT
> U+FE46 WHITE SESAME DOT
> which covers only two of the shapes. Also, they are in the compatibility
> forms block, so their use is discouraged.

As Doug Ewell surmised, this does not follow. U+FE45 (and U+FE46)
are compatibility characters, insofar as they were encoded for
compatibility with JIS X 0213. And they were encoded in the CJK
Compatibility Forms block because much of that block consists of
forms used in vertical CJK text, as are the sesame marks. But note
that they have no compatibility decomposition mapping, and there
is no indication whatsoever that their use is discouraged.

If you have need of referring to a sesame dot in CJK text, by
all means, *do* use U+FE45 SESAME DOT. That is what it is encoded
for.

> In the case of the sesame at least, the shape in printed materials closely
> parallels U+3001 IDEOGRAPHIC COMMA, which is provided by the font.

I would *not* suggest using that.

> I would like to know, is there a way, should there be a way, for the font
> in use to have some say over the glyph shape for emphasis marks?
>
> As for other shapes, I have scanned in a few examples:
> http://fantasai.inkedblade.net/style/discuss/emphasis-marks/
> I also remember a Tibetan book using x-shaped marks.

Essentially, one should use whatever character is required to get
things right. Note that the Chinese examples you have posted,
cidian-kanzhonghao2.png, etc, contain, in horizontal text, dots,
underscores, 2 different wavy lines, but also circles below and
fisheyes below. The circle is U+25CB WHITE CIRCLE, and the
fisheye is U+25C9 FISHEYE. These are just more instances of
CJK bullets of various sorts being pressed creatively into service
to create emphasis lines.

How you combine those identities into definitions of styling
for underscores and overscores and sideline scores for emphasis
styling is outside the scope of the Unicode Standard -- but
presumably inside the scope of CSS.

--Ken

>
> Any comments on shapes, usage patterns, usefulness of various settings,
> etc. would be much appreciated.
>
> ~fantasai
>
>


------------------------------------
From: Asmus Freytag (asmusf@ix.netcom.com)
Date: Mon Mar 13 2006 - 22:32:35 CST
------------------------------------
On 3/13/2006 3:02 PM, Kenneth Whistler wrote:
>> U+FE45 SESAME DOT
>> U+FE46 WHITE SESAME DOT
>>
> they were encoded for
> compatibility with JIS X 0213.
Ken,

this is a great example where harping on the putative compatibility
character status is really confusing and not helpful. Yes, X0213 had
them before we did, but *compatibility* characters they are only if we
*would not* have added them as characters for reasons of our own, or if
they violate the character glyph model in some other way.

In my estimate, we might have and they do not, at least not to the
degree that makes them special in any way. (I'll deal with their
similarity to the punctuation characters below).
> And they were encoded in the CJK
> Compatibility Forms block because much of that block consists of
> forms used in vertical CJK text, as are the sesame marks.
My recollection is, we picked up two empty slots that were handy, and
the BMP was getting full, and there were no better locations in existing
(non-compatibility) blocks. The 'related to vertical text' was a nice
bonus, but - in fact- distracting, because the other characters violate
Unicode's writing direction model, whereas these don't.

(The other ones are among the "blackest" strain of black-sheep
compatibility characters there are ;-).
> But note that they have no compatibility decomposition mapping, and there
> is no indication whatsoever that their use is discouraged.
>
Therefore, it makes no sense to emphasize them as "compatibility
characters" which are implicitly second class citizens. Let's reserve
that label for the truly unwanted.
> If you have need of referring to a sesame dot in CJK text, by
> all means, *do* use U+FE45 SESAME DOT. That is what it is encoded
> for.
>
>
Nough said.
>> In the case of the sesame at least, the shape in printed materials
closely
>> parallels U+3001 IDEOGRAPHIC COMMA, which is provided by the font.
>>
>
> I would *not* suggest using that.
>
>
The committee consensus was to discourage precisely that *hack-o-rama*
by providing dedicated codes.

(The location of the comma and period in the character box is
potentially different for each font, but for use as an emphasis mark,
you need the 'ink' at a known location, usually centered, otherwise they
won't look right).

Note, that we might want to note the fact that - by convention -
software scales the glyphs for these characters down (just as if they
had been regular characters).

A./

PS: Form the last parenthetical remark, it should be clear that for
other symbols, for which existing fonts have glyphs that are always
centered, would not require specific codes for emphasis marks.
>
>
>


------------------------------------
From: Kenneth Whistler (kenw@sybase.com)
Date: Mon Mar 13 2006 - 17:57:10 CST
------------------------------------
> BTW -- what happens if the emphasized word also has ruby?

The typical solution for that is to use *both* sides of the
text.

Vertical text: emphasize on the left (with sidelining) and ruby
on the right.

Horizontal text: emphasize on the bottom (with underscoring)
and ruby on the top.

I suppose you could find examples of emphasis styling *of* ruby,
in which case, you'd have (in Japanese vertical text), sesame
dots applied *to* the ruby as a sideline to them.

--Ken


Received on Wednesday, 15 March 2006 13:35:48 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:17:07 GMT