[whatwg] Re: getElementsByClassName

On Sun, 4 Sep 2005, Lachlan Hunt wrote:
>
> I have a partial implementation of getElementsByClassName() [1] that is
> designed to support HTML, XHTML, MathML and SVG documents (including mixed
> namespace documents).

Cool.


> It also includes Element.hasClassName(), Element.addClassName() and 
> Element.removeClassName(), which I think should also be added to WA1.

I envisage somehow making className implement the DOMTokenString 
interface:

   http://whatwg.org/specs/web-apps/current-work/#domtokenstring

...so that you would have Element.className.add(), 
Element.className.has(), etc.


> Given these HTML elements:
> 
> A: <p class="foo">
> B: <p class=" foo">
> C: <p class="foo ">
> D: <p class=" foo ">
> E: <p class="foo bar">
> F: <p class="foo 	bar">
> G: <p class="bar foo">
> H: <p class="foobar">
> I: <p class="foo-bar">
> J: <p class="FOO">

A-D are equivalent, E-G are equivalent, H, I, and J are different.


> What should each of these function calls return?  I've listed the ones that my
> script currently selects.  Are any of them incorrect?
> 
> 01. getElementsByClassName("");           | (none)
> 02. getElementsByClassName(" ");          | (none)
> 03. getElementsByClassName("foo");        | A, B, C, D, E, F, G
> 08. getElementsByClassName("foo", "bar"); | E, F, G
> 09. getElementsByClassName("bar", "foo"); | E, F, G
> 10. getElementsByClassName("foo", "foo"); | A, B, C, D, E, F, G

Correct.


> 04. getElementsByClassName(" foo");       | A, B, C, D, E, F, G
> 05. getElementsByClassName("foo ");       | A, B, C, D, E, F, G
> 06. getElementsByClassName(" foo ");      | A, B, C, D, E, F, G
> 07. getElementsByClassName("foo bar");    | E, F

Incorrect; none of the above elements are in classes that have a space 
character in the class name.


On Sun, 4 Sep 2005, Kornel Lesinski wrote:
> 
> I think variable number of arguments for that function may be cause of 
> many problems.
> As far as I'm aware W3C DOM never uses functions with variable number of
> arguments, so design of getElementsByClassName() stands out.

There are a number of other methods (e.g. in <canvas> and XMLHttpRequest) 
that use variable numbers of arguments. Why would it cause a problem?


> Functions with variable number of arguments are problematic in some 
> programming languages, and because of that W3C may not want to include 
> such method in future DOM specfications.

While it is true that some languages do not support methods with variable 
numbers of arguments, that is not a reason to limit ourselves here. It is 
quite possible for those languages to have language bindings that define 
different names for these methods, or use arrays as the arguments. For 
example, Java's DOM bindings convert attributes to methods to get around 
Java's lack of attributes.


> Such design forbids any extensibility as well - it won't be possible to 
> add new optional parameters to that function.

This line of argument seems to conflict with the suggestion that we can't 
have variable numbers of arguments.

In any case, should we need to extend this function, we can always 
introduce a new function.


> I suggest that getElementsByClassName() should take only one parameter: 
> string of space-separated class names. This may simplify implementation, 
> because same algorithm can be used to get list of class names from input 
> and from elements.

I fear that if we use a string that must be parsed, we will encourage 
buggy implementations. It also introduces many edge cases -- trimming 
spaces from the start or the end, and so forth.

And it means that should a language ever introduce classes with spaces in 
them, we will be unable to search for them.


> It will also solve IMHO unclear case of getElementsByClassName("foo 
> bar") matching "bar foo". It would, as opposed to behavior where space 
> is both separator and part of class name.

What if an element is in the class "foo bar"?


> In currently proposed implementation getElementsByClassName("foo bar") 
> doesn't match class="bar foo", but matches class="foo bar". This implies 
> that class attribute isn't just space-separated list of classes.

In the current specification, getElementsByClassName("foo bar") matches 
neither class="bar foo" nor class="foo bar".


> That's why I propose to make this function use exactly the syntax that 
> class attribute uses. getElementsByClassName("bar foo") should match 
> class="foo bar", class="bar baz foo", etc.

I fear that this would be rife with implementation bugs, as opposed to 
requiring the author to pre-split the search input, which guarentees that 
the UA does not have to process the search input in any way, only having 
to deal with the actual class attribute.


On Sun, 4 Sep 2005, Erik Arvidsson wrote:
>
> For multiple classes it makes more sense to use:
> 
> el.getElementsBySelector(".foo.bar")

Yeah, the draft notes that we should probably have that as well (or 
instead).


On Mon, 5 Sep 2005, Lachlan Hunt wrote:
>
> The problem is that white space handling in parameter values isn't 
> currently defined at all, and I implemented it assuming that each 
> parameter value would contain only one class name.  Handling the 
> (currently) erroneous parameter ("foo bar")is basically a form of error 
> recovery, and the fact that it returns anything at all is merely a 
> result of how the regex is constructed using it.

The spec now defines this better. Basically, "foo bar" would never match 
anything in HTML, XHTML, MathML or SVG.


> Before I can fix the implementation in any way, I need to know how white 
> space should be handled before (" foo"), after ("foo ") and inside ("foo 
> bar") the parameter value.  At the moment I trim any leading and 
> trailing spaces in most cases (there's currently a bug that stops it 
> working sometimes), but I don't really handle white space inside very 
> well.

The spec doesn't mention trimming, so, no trimming. :-)


> ("foo bar") could basically be handled in the following ways and I need
> to know which:
> 
> 1. Equivalent to ("foo", "bar") (or [class~=foo][class~=bar], or
>    .foo.bar in CSS)
> 2. The way it currently works. ie. matches "foo bar", not "bar foo"
> 3. Error, return nothing.

It's not an error per se, but it would return nothing with the languages 
your script currently supports.


On Mon, 5 Sep 2005, Aankhen wrote:
>
> I suggest #2, which implies consistently treating the first argument 
> passed to the function as a single class name to match (this means "foo 
> bar" would always return no elements, since a class name obviously 
> cannot contain whitespace).  Special-casing "foo bar" and other values 
> seems to be adding complexity without much return.

I agree.


> If multiple class names really need to be handled, my suggestion would 
> be to take a single array as a parameter, e.g. 
> `getElementsByClassName(["foo"])` and `getElementsByClassName(["foo", 
> "bar"])`.

What is the problem with allowing multiple arguments in JS?


On Mon, 5 Sep 2005, Lachlan Hunt wrote:
>
> I may not be understanding what you mean, but if optional parameters 
> aren't language independant, shouldn't it be defined in a more language 
> independant way, so that any non-ECMAScript languages can still 
> implement this?

If a UA implementor wishes to implement this in something other than JS, 
please let me know, so that I can include language bindings for that 
language in the spec. As it stands, there will always be languages that 
have problems implementing these APIs literally. Even, e.g., Java, has 
problems implementing the DOM literally (there are no native object 
attributes in Java, as I understand it).


On Mon, 5 Sep 2005, Lachlan Hunt wrote:
> 
> In that case, should it be redefined as:
> 
>   NodeList getElementsByClassName(in DOMString classNames);
> 
> where classNames is a string of space separated class names?  That would 
> be just as easy to implement and would work with languages that don't 
> support optional parameters.

IMHO that would lead to more implementation bugs as implementations would 
find it hard to implement exactly the same processing for the argument. A 
list of strings to use literally seems more useful.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'

Received on Monday, 5 September 2005 08:03:15 UTC