[selectors-api] Investigating NSResolver Alternatives

Hi,
   Recently, we had discussion on pubilc-webapi about what should be 
done with the current NSResolver, and whether or not it was causing more 
problems than it's worth.  I realise that there is work going into 
implementing the current NSResolver as currently specced, but no 
implementation has shipped yet, so it's not too late, and the 
implementation experience has been very useful for this research and in 
making a more informed decision.

There are a few alternative proposals that have been discussed, 
including the following

* Dropping the feature (or bug, if you prefer) entirely
* Deferring it until v2 and having browsers ship without support for
   NSResolver in inital implementations
   - (Note: The spec already allows implementations to ship without such
      support anyway)
* Use the JS Object notation: {"prefix": "uri", ...}
* Requiring a Node with appropriate namespace declarations to be passed.
* Defining a new native object that can have namespace declarations
   added to it by scripts.
   var resolver = new NamespaceResolver()
   resolver.add("prefix", "uri");
* Defining the nsresolver to be a DOMString with various possible syntaxes:
   - a space separated set of "prefix=uri prefix2=uri2"
   - JSON syntax '{"prefix": "uri"}'
   - the @namespace syntax defined in CSS.
     "@namespace 'defaultns'; @namespace prefix 'uri'"
     "@namespace url(defaultns); @namespace prefix url(uri)"


All of these have advantages and disadvantages and I intend to evaluate 
each one.

But first, the problems with the current function approach that need to 
be addressed, a the use cases that need to be addressed by any potential 
solution.


* Resolving the default namespace

The DOM 3 Core Node.lookupNamespaceURI required null to be passed in 
order to obtain the default namespace, but it would be better for both 
authors and implementers if this were "" instead.  Browsers differ 
significantly in how they implement Node.lookupNamespaceURI() anyway, so 
perhaps the theoretical compatibility with the DOM 3 Core spec isn't all 
that necessary.


* null/undefined Return Values

The spec currently requires null and undefined return values to be 
treated the same as an empty string.  This was done to make things more 
convenient for authors.  However, Boris has reported this is an 
implementation problem because they are converted to "null" and 
"undefined" as a result of the return value being defined as a DOMString.


* Hostile resolver - DOM Modification

Functions could potentially modify the DOM while resolving namespaces, 
which could result in non-interoperable behaviour.  To ensure as much 
interop as possible, the spec could require all prefixes and the default 
NS to be resolved prior to finding any potential match.  But this 
prevents implementations from performing optimisations, such as not 
resolving namespaces in unnecessary cases.

Should the implementation only return matches present during the post 
modification DOM?  Consider

<p><svg:svg>...</svg:svg></p>

querySelectorAll("p, svg|svg", resolver);

If the implementation sees that the p matches without having to resolve 
the SVG prefix, but then while resolving it to see if the <svg:svg> 
element matches, the P element is removed from the DOM.  Should the p 
element still be returned?

Possible behaviour to define to handle this:
- Leave it explicitly undefined, allow for potentially non-interoperable 
behaviour here. (Not ideal)
- Require all prefixes to be resolved first
- Allow exception to be thrown


* Moving nodes between documents

If lookupNamespaceURI() moves a node from an XML document to an HTML 
document using document.adoptNode(), the case sensitivity of selectors 
is unclear.  e.g.

Assume this script is running in an HTML document

var otherDocument = ... // some other XML document.
var foo = otherDocument.getElementById("foo");
function resolver() {
   document.adoptNode(foo); // Move node from otherDocument to this one
   return "";
}

foo.querySelector("p", resolver);

Because foo was initially in an XML document, the selector would have 
been case insensitive.  But since the resolver moved it into an HTML 
document while resolving the default ns, does it now become case 
insensitive?

Possible ways to handle this:
* Require the selector to remain case sensitive
* Allow implementation to throw exception; same as DOM Modification 
issue above


* Hanging

How to protect against silly authors that do this?

function resolver() {
   while(true);
}

or other things causing infinite loops or infinite recursion.  Loops 
could just run forever, browsers already need to ensure they don't 
completely hang the UI.  For recursion, browsers will throw a recursion 
exception when it reaches the recursion limit.

* Navigating away from the page

var iframe = getElementById("theIframe");
function resolver() {
   iframe.location = "...";
}

iframe.document.querySelectorAll("p", resolver);

I'm not really sure what should or could happen there.  Should it return 
no results? Does it keep the document around in memory until the query 
finishes and returns the elements that were in the document?  Throw an 
exception?


* Returning inconsistent results

If the browser resolves a prefix multiple times, e.g. if given the 
selector "x|p x|span", what if it returns inconsistent results?  Opera 
currently does this, but still uses first value returned anyway.


There are probably a few other issues that I have forgotten about.  But 
given the relatively large amount of time devoted entirely to this 
NSResolver issue, and the relatively minor use cases compared with the 
authors who won't even use namespaces, it's clear that the function is 
more trouble than it's worth, so it really is worth investigating 
alternative solutions.

The use cases, problems and requirements that need to be addresses by 
any possible solution are:

* Relatively easy for authors to use
* Easy for browsers to implement
* Must be able to declare the default namespace
* Must be able to declare prefixes
* Must be able to declare prefixes independently from those used in the 
document
* Must be suitable for use in all implementations, not just ECMAScript
* Must not suffer from the same problems as the function approach
* Easy to define in the spec

These are the potential solutions, and their pros and cons.

* Use the JS Object notation: {"prefix": "uri", ...} (Hash map)

pro: Easy and familiar for authors
pro: Browsers already support it
con: Only suitable for ECMAScript, though other language bindings could 
use their own Map implementation

* Requiring a Node with appropriate namespace declarations to be passed.

pro: Easy for implementations to obtain the namespaces
con: More complex for authors to create a node and set all necessary 
namespaces
con: Using a Node from the document itself ties the script to use the 
same prefixes as the document.

* Defining a new native object that can have namespace declarations
   added to it by scripts.

   var resolver = new NamespaceResolver()
   resolver.add("prefix", "uri");

pro: Easy for authors to use
pro: Relatively easy for implementers to implement
con: Requires me to specify a new interface

* DOMString: a space separated set of "prefix=uri prefix2=uri2"

pro: Easy for authors
con: Requires new parsing requirements to be specified and implemented
con: Need to reserve a special prefix for declaring the default ns

DOMString: JSON syntax '{"prefix": "uri"}'

pro: Easy and familiar for authors
pro: Easy for browsers to implement
con: Need to reserve a special prefix for declaring the default ns
con: Quite verbose for authors

DOMString: the @namespace syntax defined in CSS.
     "@namespace 'defaultns'; @namespace prefix 'uri'"
     "@namespace url(defaultns); @namespace prefix url(uri)"

pro: Familiar syntax for authors
pro: Syntax and parsing requirements already defined in CSS
pro: Already implemented by browsers
pro: Shifts case sensitivity of prefixes issue from this spec into 
css3-namespaces (makes them case insensitive)
pro: Already handles both prefixes and the default ns
pro: Suitable for non-ECMAScript languages as well
pro: Easy for me to define
con: Slightly verbose for authors, but less verbose than JSON
con: Browsers need to fix their buggy case folding implementations


Given the overall simplicity of the @namespace approach for authors, 
implementers and spec writers, I'm in favour of replacing the current 
NSResolver approach in the spec entirely.  I will draft up the 
replacement specification for this shortly and pending any major 
objections, hopefully this will allow us to resolve this issue once and 
for all.

-- 
Lachlan Hunt - Opera Software
http://lachy.id.au/
http://www.opera.com/

Received on Saturday, 12 July 2008 10:24:28 UTC