- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 16 Jan 2012 20:34:27 -0500
- To: "Kang-Hao (Kenny) Lu" <kennyluck@csail.mit.edu>
- CC: fantasai <fantasai.lists@inkedblade.net>, WWW Style <www-style@w3.org>
On 1/16/12 7:49 PM, Kang-Hao (Kenny) Lu wrote: > Practically speaking, there are two interoperability-related issues that > apply to browsers here: > 1. UAs using UTF-16 as internal storage treat a non-BMP character as two > grapheme clusters. I am aware that this is unlikely to happen so I'll > stop talking about this possibility. > 2. UAs render content with isolated surrogate differently. Are there only two? I guess when talking about UTF-16 related issuus in particuular Again, the behavior of Gecko for surrogates just falls out from the general approach to text rendering: A consecutive run of text that all has the "same" style but might span different elements (whatever that means; that's another fun discussion) is treated as a single unit for purposes of text rendering. That means that things like handling of composing characters, ligatures, shaping, etc happens on it all as a unit. Compare the behavior of this testcase in different browsers (and pardon the probably-nonsense text): <!DOCTYPE html> <body style="font-size: 40px"> ب ت<br> بت<br> <span>ب</span><span>ت</span><br> <span style="color: green">ب</span><span style="color: purple">ت</span><br> <span style="font-size: 41px">ب</span>ت In Gecko and Trident I see shaping happen for all but the first and last lines of text. In the first line it should obviously not happen; in the last line Gecko doesn't do it because it's not really clear how to shape two glyphs from different font sizes. I can't speak for Trident there, though I bet the causes for its behavior are similar. In WebKit and Presto, only the second line of text is shaped over here. I would argue that's wrong, especially for the third line of text. > That is, I don't want WebKit's behavior to fall into the "UA > may further tailor the definition (grapheme cluster) as allowed by > Unicode." I'm not sure that would cover shaping anyway, or would it? > Yeah, I kind of agree we could make the CSS specs as encoding irrelevant > as possible. I guess we can start a CSS for UTF-16 UA module 10 years > later, if we finally want to standardize Gecko's behavior on non-BMP > characters crossing element boundary :p . Handling the non-BMP case explicitly would be nice, but we have other non-interop across element boundaries too. It might turn out, as in Gecko's case, that simply trying to solve those other use cases ends up Just Working for the common non-BMP cases.... -Boris
Received on Tuesday, 17 January 2012 01:40:21 UTC