Proposed Specification for find/findAll/matches from Lachlan Hunt on 2011-12-12 (public-webapps@w3.org from October to December 2011)

From: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Date: Mon, 12 Dec 2011 12:07:58 +0100
To: public-webapps <public-webapps@w3.org>
Message-ID: <4EE5E08E.2000907@lachy.id.au>
Hi,
   I have reviewed all of the recent discussion and spent some time 
analysing JQuery, and I have compiled this rough specification detailing 
how I think find, findAll and matches can work.  The following details 
the rationale for each of the design decisions made.

The new methods should be available on documents, document fragments and 
elements, just like querySelector.  The easiest approach is to put these 
on the same NodeSelector interface as the existing methods.

This email is long and detailed. For those of you who just want the 
conclusion, skip to the the proposed IDL and summary.

---

*Table of Contents*:

1. Methods and Return Types
2. Document Methods
3. Document Fragment Methods
4. Element Methods
5. Match Testing
6. Proposed IDL
7. Summary of Proposed Spec Changes
8. Proposed Rules for Prepending :scope

---

1. *Methods and Return Types*

Throughout the discussion, there seems to be the assumption that we 
should have both find() and findAll() methods, which return a single 
matching element and a collection of all matches, respectively.  One 
issue to decide is, based on experience with and usage of 
querySelector() and querySelectorAll(), whether it worth introducing the 
same distinction for new methods, or would it be better to just go with 
a single method that returns a collection?  That is, is it really useful 
or better in practice to have the method that only returns the first match?

For the purposes of the rest of this email, however, I'll stick with the 
assumption that we'll introduce both find() and findAll().

There is also the open issue of what type of collection should be 
returned by findAll(), whether it be an Array or special kind of 
NodeList with an Array-like interface.  I have not addressed this issue 
in this proposal.

---

2. *Document Methods*

The document.findAll() method is supposed to be designed to more closely 
align with the behaviour of JQuery's global $() method, which is defined 
as an alias for the jQuery() method:

   jQuery( selector [, context] )

(Note: The other overloaded JQuery methods are not relevant.)
http://api.jquery.com/jQuery/

All script examples relate to the following sample document:

<!DOCTYPE html>
<body>
   <p id="1"></p>
   <div>
     <p id="2"></p>
   </div>
</body>

JQuery results:
(Note: All results are returned as instances of jQuery objects, which 
are indexable like an array)

   $("html") // returns [html]
   $(">body") // returns []
   $("+div") // returns []
   $(">body", document.documentElement) // returns [body]
   $(">p", $("body")) // returns [p#1]
   $("p", $("div")) // returns [p#2]
   $("+div", $("#1")) // returns [div]
   $(">p, >div", $("body")) // returns [p#1, div]

   $("body", []) // returns []
   $("body", null) // returns [body]
   $("body", undefined) // returns [body]
   $("") // returns []

document.findAll() should support the same parameters as $() and return 
an equivalent result collection in the majority of cases.  Likewise, 
document.find() should work the same way, but return only the first 
match.  Where findAll() returns an empty collection, find() would return 
null.

 From the above, it's clear that in the new API, there are cases where 
:scope should be implied and cases where it should not.  In cases where 
:scope is implied and there's no explicit combinator, a descendant 
combinator needs to be implied too.

   $("html") // returns [html]
   document.findAll("html")

:scope cannot be implied here because html is the root element, so it 
wouldn't match if the selector was interpreted as ":scope html".

   $(">body") // returns []
   document.findAll(">body")

   $("+div") // returns []
   document.findAll("+div")

:scope needs to be implied to make a syntactically valid selectors, 
making them equivalent to ":scope>body" and ":scope+div", respectively. 
But :scope cannot match the root element here because otherwise the 
first would return the body element as a match.

   $(">body", document.documentElement) // returns [body]
   document.findAll(">body", document.documentElement)

Like the previous case, :scope needs to be implied. This time, however, 
it needs to match the specified element node.  It also means that the 
second parameter must be able to accept an Element node.


   $(">p", $("body")) // returns [p#1]
   document.findAll(">p", $("body"))
   document.findAll(">p", document.findAll("body"))

This is the same as the previous case, except a collection of elements 
is passed instead of a single Element node.  This is equivalent to 
":scope>p", where :scope matches the elements in the collection.

It should work regardless of the type of collection.  In the first case, 
$() returns a numerically indexed JQuery object, the latter returns a 
yet to be defined Array-like structure.

   $("p", $("div")) // returns [p#2]
   document.findAll("p", document.findAll("div"))

In this case, despite not beginning with a combinator, the presence of a 
reference node indicates that :scope with a descendant combinator should 
be implied, equivalent to ":scope p".

   $("+div", $("#1")) // returns [div]
   document.findAll("+div", document.find("#1"))

:scope is implied, equivelent to ":scope+div", where scope matches the 
specified element.

   $(">p, >div", $("body")) // returns [p#1, div]
   document.findAll(">p, >div", $("body"))

:scope needs to be implied before each individual selector in the list, 
equivalent to ":scope>p, :scope>div".

   $("body", []) // returns []
   document.findAll("body", [])

:scope should be implied, but since the collection is empty, it matches 
nothing.

   $("body", null) // returns [body]
   document.findAll("body", null)

   $("body", undefined) // returns [body]
   document.findAll("body", undefined)

:scope should not be implied.  But note that, according to the current 
algorithm to determine contextual reference nodes in the spec, if :scope 
were explicitly included, then it would not match anything.

   $("") // returns []
   document.findAll("")

Unlike JQuery, querySelectorAll() throws a SYNTAX_ERR exception in this 
case.  I'm not sure if it's better for findAll() to throw an exception, 
or simply return an empty collection.  But note that if we decide not to 
throw for this, then the find() method would return null.

Based on these results, :scope should not be implied in all cases. 
Rather, it should be implied under the following conditions:

1. When the given selector begins with a combinator other than the
    descendant combinator (space), or
2. Whenever a reference element, or a collection of zero or more nodes,
    is passed and there is no explicit :scope.

This is most easily achieved by supporting the refElement and refNodes 
parameters in the same was as querySelector() and specifying the 
conditions under which :scope is implied.  However, the algorithm to 
determine contextual reference nodes needs to be modified so that the 
documentElement does not match :scope, when no other nodes are supplied.

We should also consider whether the definition of :scope in Selectors 4 
should be changed, which currently states that it matches the same as 
:root where no other reference elements are specified.

---

3. *Document Fragment Methods*

It seems that JQuery does not fully support querying document fragments 
and so I can't analyse it fully to make this API behave similarly.

   var f = document.createDocumentFragment();
   f.appendChild(document.body);

   $(f).find("div"); // returns []

The closest I could get was by first running the .contents() method, 
which just returns a collection of children, and searching that. 
However, this is more directly comparable with the proposed NodeArray 
interface, rather than with searching document fragments directly.

   $(f).contents().find("p") // returns [p#1, p#2]

This is the same result as the querySelectorAll() method returns.

   f.querySelectorAll("p") // returns [p#1, p#2]

I believe the sensible approach here would be to make the .find methods 
on fragments behave the same way they do on document.

---

4. *Element Methods*

When invoked on an element, the contextual reference element (that is, 
the one that matches :scope) is set to the element itself and :scope 
should be explicitly included or implicitly prepended to each selector.

JQuery results:

   $("div").find("p")  // returns [p#2]
   $("body").find(">p") // returns [p#1]
   $("body").find("p", $("div")) // returns [p#1, p#2]
   $("body").find(">p", $("div")) // returns [p#1]
   $("p").find("+div")  // returns [div]
   $("body").find(">p, >div") // returns [p#1, div]
   $("body").find("") // returns []

Element.findAll() should behave similarly to the jQuery.find() method 
and return equivalent results in the majority of cases.

   var body = document.body;
   var p = document.find("#1");
   var div = document.find("div");

   //$("div").find("p")  // returns [p#2]
   div.findAll("p")

   //$("body").find(">p") // returns [p#1]
   body.findAll(">p")

These imply :scope and behave as expected, where scope matches the 
context node.

   $("body").find("p", $("div")) // returns [p#1, p#2]
   body.findAll("p", div)

   $("body").find(">p", $("div")) // returns [p#1]
   body.findAll(">p", div)

:scope is still implied, but unlike the document.findAll() method, any 
additional parameters, including specified reference elements need to be 
ignored because :scope should still match the context node.

   $("p").find("+div")  // returns [div]
   p.find("+div")

Again, :scope is implied, but may also match siblings, not just descendants.

   $("body").find(">p, >div") // returns [p#1, div]
   body.findAll(">p, >div")

:scope is implied for each selector in the list.

   $("body").find("") // returns []
   body.findAll(""); // SYNTAX_ERR or return []?
   body.find(""); // SYNTAX_ERR or return null?

The issue of whether to throw a SYNTAX_ERR or return [] (or null) also 
applies to this case.

Finally, although not supported by JQuery, the reference combinator 
needs to be considered:

   label.find("/for/ input")

I think it makes the most sense for this case to match anywhere in the 
whole document, and imply :scope which matches the given context node.

 From the above, it's clear that in the new API, :scope should always be 
implied when there is no explicit :scope and :scope should always match 
the element on which the methods are invoked, regardless of any 
additional parameters.

---

5. *Match Testing*

There's been debate concerning whether we should just rename 
matchesSelector() to matches(), or introduce a new matches() method that 
is distinct from matchesSeletor().

In the general case,

   elm.matches(":scope *", ref);

answers the question: is this element related to (e.g. descendant, 
sibling, child or referenced-by) one or more given elements?

e.g.
   input.matches(":scope /for/ *", label)

Returns true if the label references the input element.  It's basically 
the inverse of either:

    document.find("/for/ input", label)
    label.find("/for/ input")

The reason given for introducing a new distinct method is to imply 
:scope, which matches a specified reference element, and which was 
claimed to be useful for JQuery's proxybind() method.

I was, however, unable to confirm the existence of such a method as the 
only google results for it seemed to be in recent threads on 
public-webapps; searching the JQuery forums and bug tracker returned no 
results for "proxybind", and there were no occurrences found anywhere in 
the JQuery source code on github.  The closest alternative I found was 
the .on() method, added in JQuery 1.7, which accepts an event type, 
selector and handler.

<!DOCTYPE html>
<script src="http://code.jquery.com/jquery-1.7.1.js"></script>
<div>
    <button>A</button>
    <button>B</button>
    <span><button>C</button></span>
</div>
<script>
function handler(evt) {
   alert("Clicked " + evt.target.textContent);
}

$("div").on("click", "div>button", handler)
</script>

This example attaches the click event and listens only for clicks on the 
children of the div element itself.  So the handler is called for 
buttons A and B, but not for C.

This one explicitly uses "div>button" in the selector because :scope is 
not supported and it did not work with an implied-:scope-like 
alternative ">button".

JQuery's .is() method is equivalent to Element.matches(), with the 
interface defined as as:

   .is( selector )

JQuery examples:

   var body = $("body");
   var p = $("#1");
   var div = $("div");

   body.is("body"); // returns true
   div.is("body div"); // returns true

But unlike the jQuery() method, there is no supported context parameter.

   p.is(">p", body); // returns false (context parameter not supported)
   div.is("+div", p); // returns false

Comparing that with matchesSelector() as currently defined:

   var body = document.body;
   var p = document.find("#1");
   var div = document.find("div");

   body.matchesSelector("body"); // returns true
   div.matchesSelector("body div"); // returns true

   p.matchesSelector(">p", body); // throws SYNTAX_ERR
   p.matchesSelector(":scope>p", body); // returns true

   div.matchesSelector("+div", p); // throws SYNTAX_ERR
   div.matchesSelector(":scope+div", p); // returns true

While the :scope functionality doesn't yet exist natively in JQuery, it 
is possible to emulate it:

   var p = $("#1");
   !!$("body").find(">p").filter(p).length

This returns true if p is a child of body, or false otherwise.  Using 
implied :scope, this could be handled simply by:

   p.matches(">p", body);

We could certainly do that to handle this case better, but I'm not 
convinced we need a new method distinct from the existing 
matchesSelector() method, for three reasons:

1. The most common case without reference nodes is handled by the
    methods as implemented today.

2. The shorter method name should be used for the most common case.

3. Implying :scope really only makes sense where explicit reference
    elements are provided.  Otherwise, :scope would only match the
    context node itself and would result in false always being returned
    for the common case, which highly is undesirable.

e.g.
   elm.matches(".foo"); // Should not imply :scope

In this case, if we did have two distinct methods, and the new method 
only implied :scope under certain conditions, then .matches() and 
.matchesSelector() would be nearly identical.  They would only differ 
when reference nodes are provided, and then only with respect to the 
implied or explicit :scope.

i.e.
Assume matches() implies :scope under certain conditions, and 
matchesSelector() doesn't.

elm.matches(".foo") // No implied :scope, equivalent to
elm.matchesSelector(".foo");

elm.matches(":scope .foo", ref) // Explicit :scope, equivalent to
elm.matchesSelector(":scope .foo", ref);

elm.matches(".foo", ref) // implies ":scope .foo", not equivalent to
elm.matchesSelector(".foo", ref); // No implied :scope.

The existence of two nearly identical methods, which differ in only one 
small case would likely be confusing for authors and not provide any 
real benefit.  (This is an issue we can't avoid with querySelector vs. 
find though)

I therefore believe we should simply rename matchesSelector() to 
matches() and introduce the desired implied-:scope functionality in a 
way that supports the common case, as well as the reference node case.

Given that the implied :scope behaviour needs to be made available in 
the .find() methods, it would possible to make it available for 
matches() too.  So the most reasonable approach here is to imply :scope 
according to the same rules described above for document.findAll() (i.e. 
starts with a combinator or ref nodes were passed and no explicit :scope).

---

6. *Proposed IDL*

interface NodeSelector {
   Element   find(DOMString selectors, optional Element refElement);
   Element   find(DOMString selectors, sequence<Node>? refNodes);

   ???       findAll(DOMString selectors, optional Element refElement);
   ???       findAll(DOMString selectors, sequence<Node>? refNodes);
};
Document implements NodeSelector;
DocumentFragment implements NodeSelector;
Element implements NodeSelector;

This extends the same interface as that the existing querySelector 
methods use, which will make the methods available on elements, 
documents and fragments.

Open Issues:

1. The return type for findAll is yet to be decided. It may be the
    proposed NodeArray, a regular Array or something else.

2. These new methods for Element may be split out to a separate
    interface that omits the refElements and and refNodes parameters.

3. Do we need both find() and findAll(), or should we only have a
    single new method that returns a collection?

Additionally, matchesSelector() will simply be renamed to matches().

---

7. *Summary of Proposed Spec Changes*

For Document and DocumentFragment, the refElement and refNodes 
parameters are handled according to the existing algorithm to determine 
contextual reference nodes currently in the specification.  This 
algorithm will be modified so that when the context node is document, 
:scope does not match the root element.

For Element, the refElement and refNodes parameters are effectively 
ignored and the algorithm to determine contextual reference nodes will 
be modified to always return the element itself for these methods.

Open Issue: Should this change affect Element.querySelector() too, or 
leave it as currently specified?

For all interfaces, these new find(), findAll() methods, and the renamed 
matches() method must have automatic :scope prepending, subject to the 
rules for prepending :scope, outlined below.

The findAll() method must return all matching elements from anywhere in 
the document. The find() method must return the first matching element 
in document order.

Open Issue: Should findAll("") and find("") throw SYNTAX_ERR or return 
empty collection and null, respectively?

Note:
* Element.findAll(":matches(:scope, *)"); will match all elements in
   the document, equivalent to document.querySelectorAll(*);

* Similarly, Element.findAll(":not(:scope)"); will match all elements
   excluding the :scope element.


Notes for Implementers:

   For Element methods, in cases where the selectors:

   1. Only include descendant, sibling or child combinators, and
   2. Do not include explicit :scope inside a functional pseudo-class
      i.e. :not(:scope), :matches(:scope, *)
   Implementations may optimise to only search descendants and/or
   siblings, rather than the whole document.
   Otherwise, it's possible for the selector to match any element in
   the entire document, possibly including the :scope element itself.

---

8. *Proposed Rules for Prepending :scope*

Given a selector list as input to the method, trim whitespace and then 
for each complex selector, run the first step that applies:

(Note: if the selector list is "", then there are 0 complex selectors in 
the list and the following doesn't run)

| 1. Otherwise, if the complex selector begins with any combinator other
|    than the descendant combinator (>, +, ~ or /attr/), then
|    prepend :scope immediately before the combinator.
|
| 2. Otherwise, if there are no contextual reference nodes, do not
|    prepend :scope.
|
| 3. Otherwise, If any compound selector includes a functional
|    pseudo-class that accepts a selector as its parameter, and which
|    contains the :scope pseudo-class anywhere within it, then do not
|    prepend :scope.
|    e.g. ":matches(:scope)", ":not(:scope)"
|
| 4. Otherwise, if the complex selector includes :scope within any
|    compound or simple selector, then do not prepend :scope.
|    e.g. ":scope", "div:scope.foo", "div :scope p"
|
| 5. Otherwise, prepend :scope and a descendant combinator.

Finally, return the modified list of complex selectors.

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/
Received on Monday, 12 December 2011 11:11:15 UTC