Re: Selectors, getElementsByTagName() and camelCase SVG from Maciej Stachowiak on 2009-04-01 (public-html@w3.org from April 2009)

From: Maciej Stachowiak <mjs@apple.com>
Date: Wed, 01 Apr 2009 15:59:20 -0700
To: Jonas Sicking <jonas@sicking.cc>
Cc: Henri Sivonen <hsivonen@iki.fi>, HTML WG <public-html@w3.org>, www-svg <www-svg@w3.org>
Message-id: <FEC5A67E-E892-4EA5-AADE-09CB33185348@apple.com>

On Apr 1, 2009, at 12:42 PM, Jonas Sicking wrote:

> On Wed, Apr 1, 2009 at 11:42 AM, Maciej Stachowiak <mjs@apple.com>  
> wrote:
>>
>> On Apr 1, 2009, at 6:57 AM, Henri Sivonen wrote:
>>
>>> On Apr 1, 2009, at 16:39, Henri Sivonen wrote:
>>>
>>>> Making the comparisons actually case-sensitive seems bad at least  
>>>> in the
>>>> context of Gecko.
>>>
>>>
>>> Oops. That should have been:
>>>>
>>>> Making the comparisons actually case-*in*sensitive seems bad at  
>>>> least in
>>>> the context of Gecko.
>>
>> Oops, should have read this email before replying to the last one.  
>> Why would
>> this be bad?
>
> Performance wise a case insensitive compare is *much* more expensive
> than a case sensitive one. Especially since there are neat tricks you
> can do for case sensitive compares which turns a string compare into
> basically a pointer compare.
>
> However I think we'd get the effect of a case insensitive compare if
> we made getElementsByTagName use the same case-correcting algorithm as
> the parser does.

If its a performance concern, then I do believe this can be  
implemented efficiently. WebKit also uses atomic strings for tag  
names, and indeed we also have uniqued pointers to structs holding  
namespace/tagName pairs (the QualifiedName struct). If I were to  
implement ASCII-case-insensitive comparison for getElementsByTagName  
in the face of possibly mixed-case tags in HTML documents I would do  
as follows:

1) ASCII-lowercase the argument to getElementsByTagName and atomized.  
(In WebKit we have a combined operation to ASCII-lowercase and atomize  
at the same time, and which is smart enough to make no changes if the  
string is already ASCII-lowercase and atomized).

2) ASCII-lowercase the tag name to be compared while atomizing (the  
optimization makes this essentially free in the common case that the  
tag name is already lowercase).

3) Pointer compare.

This is equivalent to an ASCII-case-insensitive comparison. (Note that  
comparing Unicode-lowercased strings would not be equivalent to a  
Unicode-case-insensitive comparison though). I think ASCII-case- 
insensitive is sufficient for Web compatibility and usefulness.

If the atomizing toLower conversion turns out to be too expensive in  
practice, we could add an extra pointer to the QualifiedName struct  
pointing to a lowercased and atomized version of the name.

I note that the current spec says "These methods (but not their  
namespaced counterparts) must compare the given argument in an ASCII  
case-insensitive manner when looking at HTML elements, and in a case- 
sensitive manner otherwise." That could be achieved even more easily  
by comparing to the lowercased string for HTML elements and to the  
original string passed in for other elements. But it might be more  
convenient to make it always ASCII case-insensitive for all elements  
in an HTML document.

Regards,
Maciej

If the atomizing toASCIILower co

Received on Wednesday, 1 April 2009 23:00:08 UTC