Re: ::first-word pseudo-element from Charles Pritchard on 2010-12-12 (www-style@w3.org from December 2010)

From: Charles Pritchard <chuck@jumis.com>
Date: Sat, 11 Dec 2010 18:37:55 -0800
To: www-style list <www-style@w3.org>
CC: John Hudson <tiro@tiro.com>, Pierre Bertet <bonjour@pierrebertet.net>
Message-ID: <4D043583.90809@jumis.com>

On 12/11/2010 1:28 PM, John Hudson wrote:
> Pierre Bertet wrote:
>
>> But the ::first-letter already do this, defining a "letter", wich is
>> not very clear too. To clarify this, the CSS3 Selectors spec refers to
>> the Unicode Standard Annex #29 [1].
>> This document seems very complex to me, but it also contains a “Word
>> Boundaries” section, which seems to defines exactly that.
>
>> So my questions are:
>> This section could it not be used to clarify what a “word” is?
>
> The extensive caveats in the notes to that section of TUS Annex #29 
> would need to be taken into account. Word boundary identification 
> needs to be tailored for many languages, and the basic Unicode 
> mechanism only aims to provide 'as workable a default as possible'.
>
> Words -- and syllables, which present similar issues for selecting 
> appropriate text elements for styling -- are units of spoken language 
> that may or may not be easily isolated as units in written language, 
> depending on particular writing systems as applied to particular 
> languages. In some systems, e.g. Thai, word selection is only possible 
> with dictionary support.
Recent discussion on the whatwg mailing list has led to some cautionary 
tales about exposing user dictionaries;
this should be taken into account with such systems.

Example data leak:

iOS mobile phones will use the contact list as part of spell checking 
and suggestion mechanisms.
Were the DOM able to detect words in relation to the user dictionary, an 
untrusted
site may be able to detect common names from the user's contact list.

It's a minor leak, but something to be aware of, if implemented improperly.


-Charles

Received on Sunday, 12 December 2010 02:37:49 UTC