RE: [css21] 5.12.2 The :first-letter pseudo-element (the Dutch "ij")

Commenting on: 

>> Robbert Broersma just told me that there are two Unicode
>> characters defined for the "Dutch ij", a uppercase and
>> lowercase variant.
>>
>> They are: \u0132 and \u0133. See also Bugzilla Bug 92176[2].
>
> They are compatibility characters, with IJ and ij as the
> compatibility decompositions. In effect, they were included
> into Unicode because they belonged to some existing
> character code standards, and Unicode was meant to be
> universal code, so that you can map data from any encoding
> into Unicode, and vice versa, without losing a distinction
> made in the other encoding. Note the difference between
> e.g. IJ and the letter AE (Æ), which is historically a
> ligature of A and E but classified as an independent
> letter, not as a compatibility character.
>
> This means that U+0132 and U+0133 are effectively just IJ
> and ij as ligatures, as typographic variants of certain
> character pairs.
> Whether you use them or IJ and ij (with or without some
> mechanism, such as a style sheet, that renders them as
> ligatures) is a practical choice, and in Web authoring,
> there are good reasons to favor the letter pairs IJ
> and ij, which are universally supported.

Per Unicode standard it is true they are compatibility characters but they are not canonical equivalent. Therefore saying that they are 'effectively' the same thing is not correct. For example normalization NFKC will unify them, but not NFC. This doesn't change the argument, but I wanted to correct a slight mistake.

BTW the CSS3 line module has a much more developed first letter section and one of the idea was to offer the page author the capability to tell what string of character is exactly the 'first letter'. The typographic effect of first letter is quite complex and being flexible here is necessary in many cases.

Michel

Received on Monday, 13 September 2004 18:53:00 UTC