- From: Boris Zbarsky <bzbarsky@MIT.EDU>
- Date: Mon, 02 Feb 2009 13:23:15 -0500
- To: "Phillips, Addison" <addison@amazon.com>
- CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>, "www-style@w3.org" <www-style@w3.org>
Phillips, Addison wrote: > The question of "statistical relevance" is, I think, a red herring. Not at all. If there are, in practice, no reasons for someone to be doing something, then there is more leeway in terms of what handling can be allowed. I'm not asking "how often" in terms of web pages, but "how often" in terms of web pages that use the characters in question at all. It sounds like there are plenty good reasons for someone to be using escapes to insert combining marks. > Yes, Western European languages are permitted to use combining marks. [etc] I don't see what this has to do with the question I actually asked, for what it's worth. > However, this is a problem of universal access. Many languages that rely on combining marks are minority languages that face other pressures (declining native literacy; majority language education; lack of vendor support). The speakers of these languages are expected to surmount many hurdles---with keyboards, fonts, etc. etc. The idea that the pressure should be on these users to deal with these issues is exclusionary. You seem to be addressing an argument that someone else, not I, made... > On the question of performance, Anne's point about the comparison is incomplete. Yes, you only do a strcmp() in your code today. You apparently didn't understand my mail on performance. We do NOT do an strcmp() today. That would have an unacceptable performance cost. We (Gecko, in this case) intern all the relevant strings at parse-time and perform comparisons by comparing the interned string identifiers. This is a single equality comparison of a pair of native machine words (the pointers to the interned strings, to be precise). > First, any two strings that are equal are, well, equal. Normalizing them both won't change that. So an obvious performance boost is to call strcmp() first. That doesn't help, because in the common case selectors in fact do not match. So detecting matching quicker is actually not much use. What's needed is detecting that the selector doesn't match as quickly as possible. > But the real performance test isn't merely the strcmp(). Selectors contains wildcards and other operations. Very rarely. The vast majority of selectors contain at least one direct string comparison operation (id match, tag name match, class name match), and these are performed first. If they don't match (common case) then nothing needs to be done for the more expensive parts of the selector. > And the comparisons are done on the document tree (there isn't just a single comparison). Indeed. If there were just a single comparison no one would be worried about its performance! > The overhead of normalization-checking the various comparable items is pretty small compared to the total execution time of the selection algorithm. Do you actually have any data to back this up? The fact is, the selection algorithm is highly optimized in most modern browsers (because it gets run so much), and normalization-checking might not be as cheap as you seem to think it is (for example, it requires either walking the entire string or flagging at internment time whether the string might require normalization, or something else). There is nontrivial cost in either memory or performance or both compared to the comparisons that are done now. > Since (let's call it) 97% of the time you won't have to normalize anything I fully expect that I don't have to normalize anything far more often than that. But it's the check to see whether I have to normalize that I'm worried about. It basically sounds to me like there is a broken design on a lower level here and we're asking all sorts of other software and specifications to work around that breakage, to be honest... That might well be needed, and wouldn't be the first time it's needed, but would the energy be more productively channeled into fixing the design? Put another way, if we're looking at a multi-year deployment timeframe for Selectors implementations that perform normalization then is Selectors the right place to be doing normalization? Or would it be better to spend the time putting in normalization on a lower level? You say that this is not compatible with the current state of software; are there any estimates of what it would take to shift that state the way you're trying to shift the state of browsers? -Boris
Received on Monday, 2 February 2009 18:24:00 UTC