- From: Maciej Stachowiak <mjs@apple.com>
- Date: Wed, 01 Apr 2009 15:59:20 -0700
- To: Jonas Sicking <jonas@sicking.cc>
- Cc: Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>, www-svg <www-svg@w3.org>
On Apr 1, 2009, at 12:42 PM, Jonas Sicking wrote: > On Wed, Apr 1, 2009 at 11:42 AM, Maciej Stachowiak <mjs@apple.com> > wrote: >> >> On Apr 1, 2009, at 6:57 AM, Henri Sivonen wrote: >> >>> On Apr 1, 2009, at 16:39, Henri Sivonen wrote: >>> >>>> Making the comparisons actually case-sensitive seems bad at least >>>> in the >>>> context of Gecko. >>> >>> >>> Oops. That should have been: >>>> >>>> Making the comparisons actually case-*in*sensitive seems bad at >>>> least in >>>> the context of Gecko. >> >> Oops, should have read this email before replying to the last one. >> Why would >> this be bad? > > Performance wise a case insensitive compare is *much* more expensive > than a case sensitive one. Especially since there are neat tricks you > can do for case sensitive compares which turns a string compare into > basically a pointer compare. > > However I think we'd get the effect of a case insensitive compare if > we made getElementsByTagName use the same case-correcting algorithm as > the parser does. If its a performance concern, then I do believe this can be implemented efficiently. WebKit also uses atomic strings for tag names, and indeed we also have uniqued pointers to structs holding namespace/tagName pairs (the QualifiedName struct). If I were to implement ASCII-case-insensitive comparison for getElementsByTagName in the face of possibly mixed-case tags in HTML documents I would do as follows: 1) ASCII-lowercase the argument to getElementsByTagName and atomized. (In WebKit we have a combined operation to ASCII-lowercase and atomize at the same time, and which is smart enough to make no changes if the string is already ASCII-lowercase and atomized). 2) ASCII-lowercase the tag name to be compared while atomizing (the optimization makes this essentially free in the common case that the tag name is already lowercase). 3) Pointer compare. This is equivalent to an ASCII-case-insensitive comparison. (Note that comparing Unicode-lowercased strings would not be equivalent to a Unicode-case-insensitive comparison though). I think ASCII-case- insensitive is sufficient for Web compatibility and usefulness. If the atomizing toLower conversion turns out to be too expensive in practice, we could add an extra pointer to the QualifiedName struct pointing to a lowercased and atomized version of the name. I note that the current spec says "These methods (but not their namespaced counterparts) must compare the given argument in an ASCII case-insensitive manner when looking at HTML elements, and in a case- sensitive manner otherwise." That could be achieved even more easily by comparing to the lowercased string for HTML elements and to the original string passed in for other elements. But it might be more convenient to make it always ASCII case-insensitive for all elements in an HTML document. Regards, Maciej If the atomizing toASCIILower co
Received on Wednesday, 1 April 2009 23:00:08 UTC