- From: Charles Pritchard <chuck@jumis.com>
- Date: Sat, 11 Dec 2010 18:37:55 -0800
- To: www-style list <www-style@w3.org>
- CC: John Hudson <tiro@tiro.com>, Pierre Bertet <bonjour@pierrebertet.net>
On 12/11/2010 1:28 PM, John Hudson wrote: > Pierre Bertet wrote: > >> But the ::first-letter already do this, defining a "letter", wich is >> not very clear too. To clarify this, the CSS3 Selectors spec refers to >> the Unicode Standard Annex #29 [1]. >> This document seems very complex to me, but it also contains a “Word >> Boundaries” section, which seems to defines exactly that. > >> So my questions are: >> This section could it not be used to clarify what a “word” is? > > The extensive caveats in the notes to that section of TUS Annex #29 > would need to be taken into account. Word boundary identification > needs to be tailored for many languages, and the basic Unicode > mechanism only aims to provide 'as workable a default as possible'. > > Words -- and syllables, which present similar issues for selecting > appropriate text elements for styling -- are units of spoken language > that may or may not be easily isolated as units in written language, > depending on particular writing systems as applied to particular > languages. In some systems, e.g. Thai, word selection is only possible > with dictionary support. Recent discussion on the whatwg mailing list has led to some cautionary tales about exposing user dictionaries; this should be taken into account with such systems. Example data leak: iOS mobile phones will use the contact list as part of spell checking and suggestion mechanisms. Were the DOM able to detect words in relation to the user dictionary, an untrusted site may be able to detect common names from the user's contact list. It's a minor leak, but something to be aware of, if implemented improperly. -Charles
Received on Sunday, 12 December 2010 02:37:49 UTC