- From: Koji Ishii <kojiishi@gluesoft.co.jp>
- Date: Sun, 8 May 2011 09:05:40 -0400
- To: John Daggett <jdaggett@mozilla.com>, Addison Phillips <addison@lab126.com>
- CC: fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>, WWW International <www-international@w3.org>
I'm splitting the thread for text-transform.
> http://dev.w3.org/cvsweb/~checkout~/csswg/css3-text/Overview.html?rev=1.128;content-type=text%2Fhtml#text-transform
>
> The 'fullwidth' value is defined as:
>
> Puts all characters in fullwidth form. If the character does not
> have a corresponding fullwidth form, it is left as is. This value
> is typically used to typeset Latin characters and digits like
> ideographic characters.
>
> Additional description:
>
> The definition of fullwidth and halfwidth forms can be found on
> the Unicode consortium web site at [UAX11]. The mapping to
> fullwidth form is defined by <wide> tag of Character Decomposition
> Mapping in [UAX44].
>
> But this doesn't really define the precise mapping function, it implies
> it obliquely. The data in the UnicodeData.txt file looks like this:
>
> FF41;FULLWIDTH LATIN SMALL LETTER A;Ll;0;L;<wide> 0061;;;;N;;;FF21;;FF21
> FF42;FULLWIDTH LATIN SMALL LETTER B;Ll;0;L;<wide> 0062;;;;N;;;FF22;;FF22
> FF43;FULLWIDTH LATIN SMALL LETTER C;Ll;0;L;<wide> 0063;;;;N;;;FF23;;FF23
> FF44;FULLWIDTH LATIN SMALL LETTER D;Ll;0;L;<wide> 0064;;;;N;;;FF24;;FF24
>
> The mapping is *from* the codepoint contained in the
> Decomposition_Mapping property when '<wide>' is present. So 'A'
> (U+0061) would map to it's fullwidth version (U+FF41).
You're right that the mapping is "from" the codepoint, but that's the definition of the Decomposition_Mapping.
00B9;SUPERSCRIPT ONE;...;<super> 0031;
means "U+00B9 is <super> of U+0031". I can add a non-normative notes how to interpret values of Decomposition_Mapping field with an example. Do you think it'd help?
> When you look at the data you also discover this:
>
> 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;;
>
> So the mapping would also map spaces to ideographic spaces. Since
> this has implications for white space collapsing, the point in the
> text handling pipeline where text-transform occurs needs to be defined
> precisely. This has been noted as an issue and discussed on www-style
> [1].
You're right. The situation is:
* From use cases, authors want to transform U+0020 to U+3000 after white space processing.
* If it's too hard for implementations, authors can live without since the feature is still useful.
* But at this point, we don't know if it's hard for implementations or not.
So at the last of the spec, fantasai and I added this paragraph:
> Text transformation happens after white space processing.
> (This only matters when ‘fullwidth’ transforms U+0020 space
> characters to U+3000.) Issue:This requirement may need to
> be relaxed during CR, so mark at-risk.
Does this solve your concern?
> The precise behavior of 'uppercase' and 'lowercase' should also
> probably be defined explicitly. Should only the
> Simple_Uppercase_Mapping and Simple_Lowercase_Mapping properties be
> used? Or should the properties contained in SpecialCasing.txt also
> apply? (My answer: yes please!).
I agree that we should state properties for these too, and I agree that we should use SpecialCasing.txt as well. Thank you for providing your expected answer beforehand, that really helps. I'll try wording and consult with fantasai.
> Instead the current draft just writes:
>
> Although limited, the case mapping process has some language
> dependencies. Some well known examples are Turkish and Greek. If
> the content language is known then any such language-specific
> rules must be used. The case mapping rules for the character
> repertoire specified by the Unicode Standard can be found on the
> Unicode Consortium Web site. [UNICODE]
>
> This is simply not sufficient to define what 'uppercase' and
> 'lowercase' means in implementation terms.
This paragraph is talking about language dependency of casing algorithm, so I think this should be kept. This is an additional requirements to Simple_Uppercase_Mapping, Simple_Lowercase_Mapping, and SpecialCasing.txt.
> Depending on how you define the case-mapping properties, there's also
> a possible ordering issue, since text-transform can be multi-valued:
>
> p { text-transform: fullwidth lowercase; }
>
> <p>ff</p> /* codepoint for ff presentational ligature */
>
> Does a viewer see the ff-ligature or fullwidth FF? This *might* be
> determined by the order in which these mappings are applied.
Great point. I think this works:
[ capitalize | uppercase | lowercase ] > fullwidth > fullsize-kana
I'll note this to the spec unless anyone has different idea.
> My point here is simply that implementors need more detail than simple
> references to parts of Unicode.
I don't think these are part of a generic issue of how to refer to Unicode, instead, these are really great review feedback. I appreciate for your efforts and knowledge to give us such a great feedback.
> [1] Effect of text-transform on spaces
> http://lists.w3.org/Archives/Public/www-style/2011Feb/thread.html#msg470
[2] http://unicode.org/faq/casemap_charprop.html
Regards,
Koji
Received on Sunday, 8 May 2011 13:08:21 UTC