[whatwg] getElementsByClassName() from Ian Hickson on 2006-10-21 (public-whatwg-archive@w3.org from October 2006)

From: Ian Hickson <ian@hixie.ch>
Date: Sat, 21 Oct 2006 07:36:37 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0610210040350.1629@dhalsim.dreamhost.com>
This omnibus edition of your WHATWG mail includes replies to 50 or so 
separate e-mails about getElementsByClassName(). Thanks to everyone for 
their comment on this issue.


On Mon, 5 Sep 2005, Brad Neuberg wrote:
> > 
> > That's right. We are defining HTML5 and the DOM extensions to support 
> > it. If other languages want to add different class name delimiters, 
> > let them. My hunch is that they will follow suit. This is a good 
> > opportunity to make it clear. HTML has always led the way. It also 
> > ensures backward-compatibility.
> 
> Exactly; what exactly would some theoretical language gain by allowing 
> spaces in class values? Might as well keep it simple and not allow them, 
> following current practice.

Well, as you later point out, we don't want to lock ourselves in. But I 
agree that it is important to consider the common case first.


On Tue, 6 Sep 2005, Kornel Lesinski wrote:
> 
> Even if it isn't reused, such function is not a rocket science. Can't 
> you trust implementors to trim and split string properly?

Yeah, you're probably right.


> > > For example I may want first to find set of classes I'd like to 
> > > match against. With solution I propose it's easy and intuitive to 
> > > anyone who used .className:
> > > 
> > > if (x) find += " class1";
> > > if (y) find += " class2";
> > > getElementsByClassName(find);
> > > 
> > > but with varargs function it's really cumbersome:
> > > if (x) find.push("class1");
> > > if (y) find.push("class2");
> > > switch(find.length)
> > > {
> > >  case 1: getElementsByClassName(find[0]); break;
> > >  case 2: getElementsByClassName(find[0],find[1]); break;
> > >  ...
> > > }
> > 
> > You can just do:
> > 
> >   if (x) find.push("class1");
> >   if (y) find.push("class2");
> >   document.getElementsByClassName.apply(document, find);
> > 
> > ...which seems much better to me than using a string.
> 
> It's the first time I see apply method used. I couldn't find it in 
> ECMA262-3 nor in WA1.0. Can you give me a hint where it's defined?

ECMA262 Edition 3 section 15.3.4.3.


> Why is that better than using string?

It probably isn't. It seems, though, that one option (suggested by Brad 
below) is to accept a string or an array, thus:

   if (x) find.push("class1");
   if (y) find.push("class2");
   document.getElementsByClassName(find);

This would probably be easier to read.


On Mon, 7 Nov 2005, ROBO Design wrote:
> 
> I opt for using just one argument with multiple class names separated 
> with a space.
> 
> I wouldn't like having multiple arguments, because when authors need to 
> get a list of elements based on multiple class names, and the number of 
> classes is not known by the author before writing the script, s/he's 
> required to use eval(). This sucks IMHO :).

Fair point.


> b) Idea: getElementsByClassName() could accept multiple argument, each 
> argument being a class name. The only difference from the current 
> definition of the spec: if any argument contains space characters, split 
> the string into an array and consider them as multiple class names.
> 
> getElementsByClassName("we all like dogs") =
> getElementsByClassName("we", "all", "like", "dogs") =
> getElementsByClassName("we all", "like dogs")
> 
> In this way you satisfy everybody.

True. It seems slightly better (since then you can easily pass an array) 
to make the function accept a string or an array.


> > We could also have a getElementBySelector() method, but it seems that 
> > it would be best to let the CSSWG define that.
> 
> The specification already defined getElementsByClassName(). So, why not 
> getElementsBySelector() too? Or this can be better handled via XPath 
> expressions?

Now covered by the following W3C spec:

   http://dev.w3.org/cvsweb/~checkout~/2006/webapi/selectors-api/Overview.html?content-type=text/html;%20charset=utf-8


On Thu, 2 Feb 2006, Brad Fults wrote:
> 
> I suggest either going with the space-delimited approach (as it's 
> language-agnostic and well-defined at least) or with Aankhen's 
> suggestion of a single array argument.
>
> I think the latter is better and more intuitive in design, however. The 
> function should take a single argument at all times.
> 
> If the argument is a string, that string is used as a single class
> name and matched against the elements in the document.
> Else if the argument is an array, each element of the array is taken
> as a string and will be treated as a class name.
> The elements which have all of the class names existing in the array
> will match and be returned in the NodeList.

That seems reasonable, although I could see someone wanting to just pass 
an element's "class" attribute value to the method directly, and have it 
work even if the attribute has multiple class names in it.


On Fri, 3 Feb 2006, Gervase Markham wrote:
> 
> If you have:
> 
> <p class="foo bar">Fred</p>
> <p class="bar foo">Barney</p>
> <p class="foo baz bar">Wilma</p>
> 
> which should be picked up by getElementsByClassName("foo bar")?
> 
> In the "string split" mode, it would pick up all three. However, I 
> suggest that designers might be misled by this interface into thinking 
> that it only picks up the first one - which matches exactly.
> 
> So I think a multiple-argument interface would remove this possible 
> confusion, and make things more obvious than a whitespace-splitting 
> interface.

I don't think authors would be that confused, to be honest. But it's hard 
to check really. They already have to deal with this in CSS...


On Fri, 3 Feb 2006, ROBO Design wrote:
> 
> I would go for:
> getElementsByClassName("foo bar")
> 
> But, as you said, that can add some confusion. Therefore, this is ideal:
> getElementsByClassName(["foo", "bar"])

I agree that that one argument is better than varargs.


On Fri, 3 Feb 2006, Shadow2531 wrote:
>
> In <http://my.opera.com/community/forums/topic.dml?id=121081> I wrote
> 2 prototypes for Opera and Firefox: getElementsByClassName()
> <http://my.opera.com/burnout426/homes/files/getElementsByClassName.zip>
> and getElementsByClassNameNS()
> <http://my.opera.com/burnout426/homes/files/getElementsByClassNameNS.zip>.
>
> I just threw them together as a proof of concept, but I have no doubt 
> that the class attribute value should be a space separated list of 
> classnames and the getElementByClassName function should split up the 
> class attribute value into an array and then search for the class name 
> to see if there's a match (Or use regex to be simpler). I also added a 
> case insensitve search option.

Interesting.


> As the case above, getElementByClassName("foo bar") doesn't make sense
> to me because there are 2 class names in there and it's
> getElementByClassName, not getElementsByClassName*s*().
>
> In getElementByClassName("foo bar") , "foo bar" should be normalized
> "foo" and therefore match all 3 because they all have "foo" in the
> class attribute value.
>
> getElementsByClassName*s*() should be a separate function and I agree,
> getElementsByClassNames*s*("foo", "bar") would be great if possible.

I understand the distinction you're making, but the method name is already 
ridiculously long and making it longer (especially by just pluralising it) 
is probably just going to confuse authors.


On Fri, 3 Feb 2006, Gervase Markham wrote:
>
> IMO there should be no option; if class names are defined as 
> case-insensitive, we should search case-insensitively, and if they 
> aren't, we shouldn't.

That makes sense.


On Fri, 3 Feb 2006, Ric Hardacre wrote:
> > 
> > <p class="foo bar">Fred</p>
> > <p class="bar foo">Barney</p>
> > <p class="foo baz bar">Wilma</p>
> > 
> > which should be picked up by getElementsByClassName("foo bar")?
> 
> this also raises the possibility of some confusion as the order of inheritance
> is important:
> 
> foo
> {
>    color: red;
> }
> 
> bar
> {
>    color: blue;
> }
> 
> in the quoted example Fred and Wilma would be blue and barney red. so the
> distinction between class="foo bar" and class="bar foo" is real, not merely
> syntactic.

This is incorrect (even assuming you meant .foo and .bar). All three would 
be blue in the example above. (You later pointed that out, I am just 
replying here for completeness' sake!)


On Fri, 3 Feb 2006, Shadow2531 wrote:
> > >
> > > getElementsByClassName*s*() should be a separate function and I 
> > > agree, getElementsByClassNames*s*("foo", "bar") would be great if 
> > > possible.
> >
> > I also think it's not ideal to have two functions with such similar 
> > names.
>
> O.K.  Then, it should be getElementByClassName*s*() where you have have 
> 1 or more classname arguments.  If you pass more than 1 class name, both 
> class names have to be present in the classname attribute for the 
> element to match.
>
> I'm just being picky here though, but getElementByClassName() implies 
> only one classname where as getElementByClassNames() implies one or 
> more. Although as long as you know what the function does, that hardly 
> matters, but I bring it up anyway. :)

Again, while I agree with your point in principle, I don't think it's 
really practical in this case.


On Fri, 3 Feb 2006, Gervase Markham wrote:
> 
> Are there similar functions in the DOM at the moment which can take
> multiple arguments?

There's at least one example of everything in the DOM.


> Do you pass an array or multiple individual arguments, or can you do 
> both?

window.open() takes a comma-separated list, if I recall correctly.


On Fri, 3 Feb 2006, ROBO Design wrote:
>
> I believe there's some disagreement on what is this function supposed to 
> do.
> 
> Example:
> 
> <p class="foo bar sample">paragraph 1
> <p class="foo sample">paragraph 2
> <p class="bar sample">paragraph 3
> <p class="sample">paragraph 4
> 
> 1. Should it return *all* elements which have *all* the class names wanted?
>
> getElementsByClassNames(["foo", "sample"])
> returning: p1, p2

Yes.


> 2. Should it return *all* elements which have *only* the class names wanted?
> 
> getElementsByClassNames(["sample", "foo"])
> returning: p2

No.


> 3. Should it return *all* elements which have *any* of the class names wanted?
> 
> getElementsByClassNames(["sample", "foo"])
> returning: p1, p2, p3, p4

No.


> 4. Should the order matter?

No.


> I also believe this function will always be confusing, no matter what. 
> To drop all confusion just rename it to getElementsByCSSselector() and 
> you get the exact idea to as what you are supposed to provide as an 
> argument (if you know CSS). Yet, this is something Ian Hickson said is 
> beyond the purpose of WHATWG.

It's now being discussed at the W3C, at the URI given above.


> A comma separated list, or any list separated by any other character 
> can't be used. The reason has been over-stated (search the mailing 
> list): basically who what if some other styling language defines that 
> class names can contain the space char, or the comma char?

Well, we can do it if we allow the array version to not be split, because 
then we have a way of doing it in the theoretical case you raise.



On Fri, 3 Feb 2006, ROBO Design wrote:
>
> I'd personally like this:
> 
> getElementsByClassNames(["array", "of","class","names"], bool any)
> 
> if any = true then get all elements that have any of the class names 
> provided, otherwise get all elements that have *all* class names 
> provided.

What's the use case?


> > > 4. Should the order matter?
> > 
> > No, because class name ordering does not matter in the source or in 
> > CSS.
> 
> True, but Ric has made an interesting point about the order of class 
> names, which does actually matter.

Actually, Ric's point was incorrect. Order is defined not to matter, 
whether for this API or for semantics or for the CSS cascade or selectors.


A number of people gave use cases. I'd like to thank everyone for your 
input; I agree with the need for this feature.


On Fri, 3 Feb 2006, Brad Fults wrote:
> 
> Now that we can get past "why" we're specifying such a function, I feel 
> the need to reiterate the constraints on its specification, as some have 
> apparently forgotten them or neglected to read the discussion in its 
> entirety:
> 
> 1. getElementsByClassName() must be host language agnostic. That is, it 
> must work with HTML, XHTML, SVG, MathML, and any other markup languages 
> which are approved for its use. Assumptions like "class names cannot 
> contain a space" may be ones that we can't make in light of this 
> requirement (IMHO).

The modified version of the model you proposed that I describe above (with 
a single string argument splitting on spaces, and an array version that 
doesn't split) supports both space-separated (the common case) and 
non-space-separated (the hypothetical case), so I think it's a good 
solution.

Note that space-separated is the common case here, and thus our primary 
concern should be for space-separated classes.


> 2. getElementsByClassName() must be *binding language* agnostic. That 
> is, we cannot assume that it will only be used in JS. It should be 
> designed, as with all of the other DOM functions to my knowledge, 
> without special features which are specific to any one binding language. 
> That said, I believe a variable number of arguments is completely out.

Yeah, varargs is now out. We're still going to use overloading, I think, 
but in non-overloading-supporting languages, the string version can just 
be dropped.

Note that JavaScript is the main language here, and thus our primary 
concern should be for JavaScript.


> 3. getElementsByClassName() should succeed as expected in the simplest 
> use case. From our use case above, this means calling 
> document.getElementsByClassName("x-widget") returns all elements 
> containing the class "x-widget" -- never mind which other classes those 
> elements have.

Agreed.

But I think people would also expect:

   document.getElementsByClassName(element.className)

...to work, even if |element| has multiple classes.


On Sat, 4 Feb 2006, Lachlan Hunt wrote:
> 
> There's also IE's behaviours using HTCs and the proprietary 'behavior' 
> property in CSS, but it seems rather complex and HTCs look quite messy 
> (they seem to be a weird mixture of proprietary HTML with some XML 
> syntax), although perhaps they were on the right track with the concept 
> of making them declarative.
> 
> For example, using this CSS-like syntax (but it's not CSS).
> 
> selector {
>   event-name: function();
> }

See XBL2 for more on this line of thought:

   http://dev.w3.org/cvsweb/~checkout~/2006/xbl2/Overview.html?content-type=text/html;%20charset=utf-8

It's quite mature these days.


On Sun, 5 Feb 2006, Lachlan Hunt wrote:
> 
> One issue with a Selector method though, how do we handle namespace 
> prefixes?

The solution Anne used was to pass a function that does namespace lookup.


> > > Ian has already indicated that the specification of a method to 
> > > collect DOM elements based on a CSS selector is best left to the CSS 
> > > WG.
> 
> I'd like to know why this is the case.  Defining a DOM method seems like 
> it would be out of scope for the CSS working group and seems to be in 
> the scope of the WHATWG.  Other than that, it could be left up the the 
> DOM WG or possibly the Web API WG although it doesn't quite fit anything 
> anything mentioned in their charter.

WebAPI are the ones currently doing this:

   http://dev.w3.org/cvsweb/~checkout~/2006/webapi/selectors-api/Overview.html?content-type=text/html;%20charset=utf-8


On Sun, 5 Feb 2006, James Bennett wrote:
> On 2/4/06, Lachlan Hunt <lachlan.hunt at lachy.id.au> wrote:
> >
> > While there are many JavaScript implementations (I even wrote one 
> > myself a few months ago), all the custom JS implementations count for 
> > exactly zero native implementations in UAs, which is what really 
> > counts.
> 
> I'm not sure what relevance that has; if you look at the major UAs out 
> there, one (IE) is pretty much stagnant, and the rest don't implement a 
> whole lot of new features unless those features are found in a W3C 
> WHATWG spec. So UA implementations aren't currently a fertile ground for 
> spotting new and useful things to put on a standards track.
> 
> And really, after all these years of harping at browser developers to 
> stick to the standards, are we now going to cut off innovation by saying 
> we'll only look at new ideas if they've been implemented by browser 
> developers without previous standardization? That seems like one heck of 
> a big catch-22 to me, and looks like it'd leave a lot of good ideas out 
> in the cold because they've "only" been implemented in JavaScript.

Actually browsers have all been busy implementing their own extensions as 
experiments for the past few years, and the WHATWG spec draws heavily from 
those experiences. For example, the following features were invented 
outside the WHATWG and later came to be added to the WHATWG spec: 
<canvas>, contenteditable, tabindex, drag and drop, and the text selection 
APIs. And the following features are based heavily on features that were 
prototyped in browser implementations: <datagrid>, <command>, server-sent 
events, and cross-frame messages. I'm probably missing others.


On Tue, 14 Feb 2006, Shadow2531 wrote:
>
> I was *messing* around with 2 different *examples*.
>
> 1.) http://shadow2531.com/opera/js/getElementsByClassName/000.html
> 
> That one supports:
> getElementsByClassName(string);
> getElementsByClassName(array);
> 
> If the string has spaces in it, it's considered that nothing will match 
> and returns null. If it's an array, all must be present for an element 
> to match.
> 
> 2.) http://shadow2531.com/opera/js/getElementsByClassName/001.html
> 
> Now this one supports the same 2 types, but the string handling is 
> different. The string is space-separated.
> 
> So, with this second example, you can do:
>
> document.getElementsByClassName("aaa");
> document.getElementsByClassName(["bbb", "ccc"]);
> document.getElementsByClassName("bbb ccc");
> 
> (The second 2 produce the same result. The 3rd one might just be cleaner 
> in certain situations)
>
> I'm liking what options the second example provides. (not necessarily 
> the code as I just threw it together and didn't think about exceptions, 
> optimization and code size. Plus I just used a global function for the 
> example.)

Very interesting. Thanks for doing these experiments, implementation 
experience is always welcome.


> Do you agree with the string being space-separated? It seems to make 
> sense at least for html where a classname can't have spaces.

Yeah, I think it makes a lot of sense.


On Tue, 6 Sep 2005, Dean Edwards wrote:
> Ian Hickson wrote:
> > On Tue, 6 Sep 2005, Dean Edwards wrote:
> > 
> > > That's right. We are defining HTML5 and the DOM extensions to 
> > > support it. If other languages want to add different class name 
> > > delimiters, let them. My hunch is that they will follow suit. This 
> > > is a good opportunity to make it clear. HTML has always led the way. 
> > > It also ensures backward-compatibility.
> > 
> > Well, it's clear that HTML classes can't contain spaces... I'm not 
> > really sure what you're asking for here. :-)
> 
> Sorry. It seemed that previous emails were suggesting there could be 
> other delimiters. My mistake. ;-)

Only in other languages, not in HTML, MathML, SVG, or any of the other 
languages that exist today. It's a theoretical problem, it can't happen 
(that I know of) today. In fact, because it's theoretical, I've stopped 
worrying about it. But the current proposal does support it.


On Tue, 6 Sep 2005, Lachlan Hunt wrote:
> > > 
> > > In which case, would it be worth adding a note to the spec stating 
> > > that implementations should not assume that all languages will use 
> > > white space delimiters between class names?
> > 
> > Well, it's highly theoretical. It seems such a note might be more 
> > confusing than helpful. What do you think?
> 
> I think fixing the grammar of this paragraph and adding one more 
> sentence won't be too confusing
> 
> Current text:
> 
> | The space character (U+0020) is not special in the method's arguments.
> | In HTML, XHTML, SVG and MathML it is impossible for an element to
> | belong to a class whose name contains a space character, however, and
> | so typically the method would return no nodes if one of its arguments
> | contained a space.
> 
> Suggested text:
> 
>   The space character (U+0020) is not special in the method's arguments.
>   In HTML, XHTML, SVG and MathML it is impossible for an element to
>   belong to a class whose name contains a space character and thus, for
>   these languages, the method would return no nodes if one of its
>   arguments contained a space.  This does not, however, prevent other
>   languages from allowing spaces in class names.

This text is now gone.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Saturday, 21 October 2006 00:36:37 UTC