- From: Chris Lilley <chris@w3.org>
- Date: Wed, 22 Oct 2003 16:30:10 +0200
- To: "Richard Ishida" <ishida@w3.org>
- Cc: www-international@w3.org, www-style@w3.org
On Wednesday, October 22, 2003, 2:58:51 PM, Richard wrote: RI> See below a transcript of a mail exchange between myself and François RI> Richard (top to bottom order). RI> Francois wrote: RI> I have been looking around for more info on the CSS 'text-transform', RI> its purpose and usage. I have the feeling that it might make the RI> processing of text more complex since it actually transforms characters. It doesn't transform characters, and is thus designed to make text processing in general (including use of TM) *more* efficient. Consider a page style where the major title is capitalised, first level subheadings have initial caps, and body text is lower case except for required capitalisation. The straightforward, but wrong, way to do this is to change the characters: <major-title>THE EFFECT OF CHARACTER MANIPULATION ON TRANSLATION MEMORY</major-title> <subhead>The Effect of Character Manipulation on Translation Memory</subhead> <para>Manipulation of characters can have a negative impact on the efficiency of Translation Memory, in the same way that multiple URIs for the same resource have a negative effect on Web proxy cache efficiency ...</para> Additional variations are possible if some sections (eg, the first two lines of the first paragraph after a subhead) are in small caps, depending on whether your smallcaps font puts those glyphs on upper case, lower case, or - as is usual - both cases (in which case the FolLoWing tEXt wOUld disPLaY just fine) The correct way to do this is to separate the stylability (and restylability) of the text from the content of the text. <major-title>The effect of character manipulation on Translation Memory</major-title> <subhead>The effect of character manipulation on Translation Memory</subhead> <para>Manipulation of characters can have a negative impact on the efficiency of Translation Memory, in the same way that multiple URIs for the same resource have a negative effect on Web proxy cache efficiency ...</para> This will, with two lines of CSS, display identically to the first example. However, by using a consistent capitalisation throughout the text, the efficiency of Translation Memory is improved. Restylability (once the designers decide in two years time that capitalized headings are *so* 2003) is also enhanced, as the new style requires a one line change in site.css rather than multiple line changes in all of the content. As with all styling (eg, relative and absolute positioning) its also possible to make egregious hacks with it, but the intended usage helps, rather than hindering, translation. So yes, its possible to have rAnsOm nOTe cAPiTaliZatIon and then rely on CSS to regularize the capitalisation, thus totally messing up the Translation Memory; this does not seem to be at all common, and would be bad practice. So on balance, text-transform helps much more than it hinders. RI> Richard's postscript: RI> François and Yves are expressing concerns that I'm sure will be shared RI> by a large number of localization folks out there. I think it is RI> important to state things clearly in the CSS spec - RI> http://www.w3.org/TR/CSS21/text.html#propdef-text-transform should RI> contain a paragraph that clearly spells out that this is only 'smoke and RI> mirrors'. That it should not be relied upon to 'make the text look RI> right', only to apply an alternative styling effect that may not be RI> desirable or applicable for all languages (eg. German or Turkish). I agree with this good practice note and support its inclusion, plus a good practice and bad practice example. I will also add the examples and discussion from this thread to the TAG finding on the separation of content and presentation. Translatability forms a part of this separation that is not often addressed. Translatability is affected both by contamination of content with styling, as above, but also the contamination of styling with content (especially in XSLT templates, for example). RI> I also suspect that TM tools might work better if they used case RI> independent (and even Unicode normalised) matching - possibly comparing RI> case as a second level differentiator where appropriate (like a sorting RI> algorithm). (If you want to respond to this para, maybe just reply to RI> www-international). Certainly, Translation Memory is aided by early and consistent Unicode normalisation. Within an organisation, if a pass is made over legacy content to remove muddled styling and make the content well formed, then also making in normalized before committing the revised files back into the repository would yield benefits in TM efficiency by reducing false negatives. -- Chris mailto:chris@w3.org
Received on Wednesday, 22 October 2003 11:07:45 UTC