- From: John Daggett <jdaggett@mozilla.com>
- Date: Tue, 3 May 2011 23:34:38 -0700 (PDT)
- To: Addison Phillips <addison@lab126.com>
- Cc: fantasai <fantasai.lists@inkedblade.net>, www-style@w3.org, WWW International <www-international@w3.org>
Addison Phillips wrote: > > > Since CSS specs are both explaining behavior and defining > > > implementation, referring to a Unicode technical note is fine > > > for referring to a deeper explanation of a concept but is *not* > > > sufficient for defining implementation behavior. Implementation > > > behavior should be defined in terms of the Unicode database [1] > > > instead, by referencing specific data fields in specific files, > > > e.g. the EastAsianWidth.txt file in your example here. The > > > technical notes often don't always cover all the subtleties > > > implicit in using this data and that's something any definition > > > of implementation behavior needs to cover explicitly, otherwise > > > you end up with untestable muddle. > > > > The EastAsianWidth.txt file is referenced from UAX11. UAX11 gives > > the explanation of what it means, how to use it, etc. So I think > > that referring to UAX11 is the correct thing to do here. I'll let > > Addison correct me if I'm wrong. > > In my opinion, you are correct to use UAX11 as a reference. UAX > means "Unicode Standard Annex", i.e. it is an integral part of the > Unicode Standard. John Daggett's comments do apply to some other > classes of Unicode Technical Report and sometimes an Annex (or > Technical Standard) may not be complete as a reference unto itself. > But, in this case, UAX11 deals with East Asian Widths and focuses on > defining the Unicode informative property in question. It is thus > probably the best reference to EastAsianWidth.txt, although a > separate reference to the latter file might also be useful for > implementers. I wasn't arguing that we shouldn't refer to Unicode annexes or technical reports, I'm saying that it's not sufficient to define the implementation of a given CSS property. For that I think we should be including more detail, specifically that the definition of a given CSS property should reference the specific property in the Unicode database rather than relying on the property and its handling being "obvious" by referring to a given portion of the Unicode specification/annex or technical report. In the case of the 'text-orientation' property, the reasons for this are evidenced by the issue noted at the end of the property description: "Issue: Need to define handling of EAW Ambiguous (A) symbols and punctuation." In other words, the decision as to whether to rotate the glyphs for a given character in vertical text needs to be more clearly specified since this is *not* explicitly covered as part of the text of UAX11. Perhaps a better example of the same issue exists in the definition of the 'text-transform' property in the current Editor's Draft of CSS3 Text: http://dev.w3.org/cvsweb/~checkout~/csswg/css3-text/Overview.html?rev=1.128;content-type=text%2Fhtml#text-transform The 'fullwidth' value is defined as: Puts all characters in fullwidth form. If the character does not have a corresponding fullwidth form, it is left as is. This value is typically used to typeset Latin characters and digits like ideographic characters. Additional description: The definition of fullwidth and halfwidth forms can be found on the Unicode consortium web site at [UAX11]. The mapping to fullwidth form is defined by <wide> tag of Character Decomposition Mapping in [UAX44]. But this doesn't really define the precise mapping function, it implies it obliquely. The data in the UnicodeData.txt file looks like this: FF41;FULLWIDTH LATIN SMALL LETTER A;Ll;0;L;<wide> 0061;;;;N;;;FF21;;FF21 FF42;FULLWIDTH LATIN SMALL LETTER B;Ll;0;L;<wide> 0062;;;;N;;;FF22;;FF22 FF43;FULLWIDTH LATIN SMALL LETTER C;Ll;0;L;<wide> 0063;;;;N;;;FF23;;FF23 FF44;FULLWIDTH LATIN SMALL LETTER D;Ll;0;L;<wide> 0064;;;;N;;;FF24;;FF24 The mapping is *from* the codepoint contained in the Decomposition_Mapping property when '<wide>' is present. So 'A' (U+0061) would map to it's fullwidth version (U+FF41). When you look at the data you also discover this: 3000;IDEOGRAPHIC SPACE;Zs;0;WS;<wide> 0020;;;;N;;;;; So the mapping would also map spaces to ideographic spaces. Since this has implications for white space collapsing, the point in the text handling pipeline where text-transform occurs needs to be defined precisely. This has been noted as an issue and discussed on www-style [1]. The precise behavior of 'uppercase' and 'lowercase' should also probably be defined explicitly. Should only the Simple_Uppercase_Mapping and Simple_Lowercase_Mapping properties be used? Or should the properties contained in SpecialCasing.txt also apply? (My answer: yes please!). Instead the current draft just writes: Although limited, the case mapping process has some language dependencies. Some well known examples are Turkish and Greek. If the content language is known then any such language-specific rules must be used. The case mapping rules for the character repertoire specified by the Unicode Standard can be found on the Unicode Consortium Web site. [UNICODE] This is simply not sufficient to define what 'uppercase' and 'lowercase' means in implementation terms. Depending on how you define the case-mapping properties, there's also a possible ordering issue, since text-transform can be multi-valued: p { text-transform: fullwidth lowercase; } <p>ff</p> /* codepoint for ff presentational ligature */ Does a viewer see the ff-ligature or fullwidth FF? This *might* be determined by the order in which these mappings are applied. My point here is simply that implementors need more detail than simple references to parts of Unicode. Rather than rely on folks like Boris Zbarsky, David Baron and Sergey Malkin to point these details out when they actually dig through the references for a given property and ponder on them for a bit, I think it would be better (and simpler!) for the specs to detail these algorithms precisely so that issues like these are clear to not just implementors steeped in Unicode lore but to authors, QA folks and other mere mortals. Regards, John Daggett [1] Effect of text-transform on spaces http://lists.w3.org/Archives/Public/www-style/2011Feb/thread.html#msg470
Received on Wednesday, 4 May 2011 06:37:16 UTC