- From: Koji Ishii <kojiishi@gluesoft.co.jp>
- Date: Sun, 8 May 2011 09:05:40 -0400
- To: John Daggett <jdaggett@mozilla.com>, Addison Phillips <addison@lab126.com>
- CC: fantasai <fantasai.lists@inkedblade.net>, "www-style@w3.org" <www-style@w3.org>, WWW International <www-international@w3.org>
I'm splitting the thread for text-transform. > http://dev.w3.org/cvsweb/~checkout~/csswg/css3-text/Overview.html?rev=1.128;content-type=text%2Fhtml#text-transform > > The 'fullwidth' value is defined as: > > Puts all characters in fullwidth form. If the character does not > have a corresponding fullwidth form, it is left as is. This value > is typically used to typeset Latin characters and digits like > ideographic characters. > > Additional description: > > The definition of fullwidth and halfwidth forms can be found on > the Unicode consortium web site at [UAX11]. The mapping to > fullwidth form is defined by <wide> tag of Character Decomposition > Mapping in [UAX44]. > > But this doesn't really define the precise mapping function, it implies > it obliquely. The data in the UnicodeData.txt file looks like this: > > FF41;FULLWIDTH LATIN SMALL LETTER A;Ll;0;L;<wide> 0061;;;;N;;;FF21;;FF21 > FF42;FULLWIDTH LATIN SMALL LETTER B;Ll;0;L;<wide> 0062;;;;N;;;FF22;;FF22 > FF43;FULLWIDTH LATIN SMALL LETTER C;Ll;0;L;<wide> 0063;;;;N;;;FF23;;FF23 > FF44;FULLWIDTH LATIN SMALL LETTER D;Ll;0;L;<wide> 0064;;;;N;;;FF24;;FF24 > > The mapping is *from* the codepoint contained in the > Decomposition_Mapping property when '<wide>' is present. So 'A' > (U+0061) would map to it's fullwidth version (U+FF41). You're right that the mapping is "from" the codepoint, but that's the definition of the Decomposition_Mapping. 00B9;SUPERSCRIPT ONE;...;<super> 0031; means "U+00B9 is <super> of U+0031". I can add a non-normative notes how to interpret values of Decomposition_Mapping field with an example. Do you think it'd help? > When you look at the data you also discover this: > > 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;; > > So the mapping would also map spaces to ideographic spaces. Since > this has implications for white space collapsing, the point in the > text handling pipeline where text-transform occurs needs to be defined > precisely. This has been noted as an issue and discussed on www-style > [1]. You're right. The situation is: * From use cases, authors want to transform U+0020 to U+3000 after white space processing. * If it's too hard for implementations, authors can live without since the feature is still useful. * But at this point, we don't know if it's hard for implementations or not. So at the last of the spec, fantasai and I added this paragraph: > Text transformation happens after white space processing. > (This only matters when ‘fullwidth’ transforms U+0020 space > characters to U+3000.) Issue:This requirement may need to > be relaxed during CR, so mark at-risk. Does this solve your concern? > The precise behavior of 'uppercase' and 'lowercase' should also > probably be defined explicitly. Should only the > Simple_Uppercase_Mapping and Simple_Lowercase_Mapping properties be > used? Or should the properties contained in SpecialCasing.txt also > apply? (My answer: yes please!). I agree that we should state properties for these too, and I agree that we should use SpecialCasing.txt as well. Thank you for providing your expected answer beforehand, that really helps. I'll try wording and consult with fantasai. > Instead the current draft just writes: > > Although limited, the case mapping process has some language > dependencies. Some well known examples are Turkish and Greek. If > the content language is known then any such language-specific > rules must be used. The case mapping rules for the character > repertoire specified by the Unicode Standard can be found on the > Unicode Consortium Web site. [UNICODE] > > This is simply not sufficient to define what 'uppercase' and > 'lowercase' means in implementation terms. This paragraph is talking about language dependency of casing algorithm, so I think this should be kept. This is an additional requirements to Simple_Uppercase_Mapping, Simple_Lowercase_Mapping, and SpecialCasing.txt. > Depending on how you define the case-mapping properties, there's also > a possible ordering issue, since text-transform can be multi-valued: > > p { text-transform: fullwidth lowercase; } > > <p>ff</p> /* codepoint for ff presentational ligature */ > > Does a viewer see the ff-ligature or fullwidth FF? This *might* be > determined by the order in which these mappings are applied. Great point. I think this works: [ capitalize | uppercase | lowercase ] > fullwidth > fullsize-kana I'll note this to the spec unless anyone has different idea. > My point here is simply that implementors need more detail than simple > references to parts of Unicode. I don't think these are part of a generic issue of how to refer to Unicode, instead, these are really great review feedback. I appreciate for your efforts and knowledge to give us such a great feedback. > [1] Effect of text-transform on spaces > http://lists.w3.org/Archives/Public/www-style/2011Feb/thread.html#msg470 [2] http://unicode.org/faq/casemap_charprop.html Regards, Koji
Received on Sunday, 8 May 2011 13:08:21 UTC