RE: [css3-fonts] humane 'unicode-range' from Koji Ishii on 2011-05-02 (www-style@w3.org from May 2011)

From: Koji Ishii <kojiishi@gluesoft.co.jp>
Date: Mon, 2 May 2011 01:53:07 -0400
To: John Daggett <jdaggett@mozilla.com>
CC: CSS WWW Style <www-style@w3.org>, "CJK discussion (public-i18n-cjk@w3.org)" <public-i18n-cjk@w3.org>, Christoph Päper <christoph.paeper@crissov.de>, "markdavis@google.com" <markdavis@google.com>
Message-ID: <A592E245B36A8949BDB0A302B375FB4E0AC2875335@MAILR001.mail.lan>

> In your example I think you mean @font-face, no?

Yes, my mistake, thank you.

First of all,

> For CSS3 Fonts, we need to keep it simple and ship it! ;)

Yes, I agree with this. I investigated further and the requirement for composite font isn't very high, so I take its priority lower than shipping CSS3 Fonts earlier.

For your opinions:

> As Jonathan Kew points out, what you're asking for is syntax that defines
> ranges based on specific properties in the Unicode database.

Yes, that's what I'm asking, and when thinking about a useful composite font logic for CJK, I think EAW and Script property are most useful, much more useful than Block name.

> For the use case of using one font for Latin, another for Japanese, this
> doesn't really yield the ideal result.  The are lots of EAW=A characters
> in the extended Latin ranges and EAW=N probably covers all sorts of ranges
> that an author wasn't really considering (e.g. Syriac, Mandaic, Thai, Lao,
> Tibetan).

They're A because some CJK legacy encodings had code points there, which means CJK fonts are likely to have glyphs for them. EAW doesn't look logical distinction why this code point is A and why this is N, but I think it make sense to distinguish which one can belong to CJK fonts in users' mind.

> Plus this syntax requires the author to understand the complexities
> of the Unicode character database which I don't think is a great idea.

Right, I have to agree with this.

> The actual original discussion was about simple named ranges [1], for
> example using block definitions in the Unicode database (Blocks.txt)
> to define simple block name ==> range substitutions:
> 
> @font-face {
>   font-family: simplefont;
>   src: local(JapaneseFont);
>   /* implicit definition of unicode range as u+0:10ffff */
> }
> 
> @font-face {
>   font-family: simplefont;
>   src: local(LatinFont);
>   unicode-range: "Basic Latin", "Latin-1 Supplement";  /* equivalent to u+000:1ff */
> }
> 
> I think the key weakness in these schemes is that it's hard to find
> the ideal set of named mappings.  Using Unicode blocks or script
> definitions doesn't give you a simple "Arabic" or "Latin" mapping and
> there are common blocks to consider.  And it doesn't break up the CJK
> ideographs block in interesting ways which is a very important use
> of unicode-range.

Right, that's why I was thinking EAW can be a good candidate. While I was editing CSS3 Writing Modes and Text specs, I found EAW solves a lot of problems blocks or script cannot solve.

Punctuation characters are really tough; they don't have scripts, and they were unified among the legacy code pages when two punctuation in different locale look closer.

> There might be a simple way of defining named blocks that can be referenced
> but I think more advanced ways of defining unicode ranges should be left
> for a later version of the fonts spec in CSS.  Once unicode-range is actually
> implemented and in use, I think the use cases for extensions will be much
> clearer.

I agree. Although I still believe combining EAW and script is the best tool for CJK authors today, I agree that it's not important enough to put in CSS3 Fonts, and it's still too complicated for regular authors to understand.

I'm discussing with some folks in Japan how we could solve unified punctuation problems and nobody has a reasonable answer yet. We'll keep the discussion for future.


Regards,
Koji

Received on Monday, 2 May 2011 05:53:24 UTC