[Bug 17176] New: Element attributes should not be required to be stored in an ordered list, .innerHTML remains unspecified

https://www.w3.org/Bugs/Public/show_bug.cgi?id=17176

           Summary: Element attributes should not be required to be stored
                    in an ordered list, .innerHTML remains unspecified
           Product: HTML WG
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: normal
          Priority: P2
         Component: HTML5 spec (editor: Ian Hickson)
        AssignedTo: ian@hixie.ch
        ReportedBy: divye_kapoor@hotmail.com
         QAContact: public-html-bugzilla@w3.org
                CC: mike@w3.org, public-html-wg-issue-tracking@w3.org,
                    public-html@w3.org


Summary:
The Element specification [1] requires that the list of attributes be an
_ordered_ list. However, this poses multiple issues with regards to the
treatment of .innerHTML and .outerHTML, cross browser compatibility, anecdotal
user expectations [2][3] and potentially performance. Specifically, the
specification leaves undefined the algorithm used for ordering the keys in this
ordered list. This causes the output of .innerHTML, .outerHTML to be browser
dependent which in and of itself isn't a bad thing, but it causes a type of
non-determinism that prevents writing a cross browser assertHTMLEquals(...)
function in unit testing frameworks very cumbersome. (Details are mentioned
below)

Minimal Test Case:
HTML: 
<div id="real"><a id="foo" class="blah" href="#">link</a></div>
<div id="temp"></div>

JS:
function f() {
var anchorElement = document.getElementById('foo');
anchorElement.id = 'foochanged';
return document.getElementById('real').innerHTML;
}

function g() {
var anchorElement = document.getElementById('foo');
anchorElement.removeAttribute('id');
anchorElement.setAttribute('id', 'foochanged');
return document.getElementById('real').innerHTML;
}

var tempDiv = document.getElementById('temp');
var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>';
tempDiv.innerHTML = expectedHTML;

Case1:
assertEquals(tempDiv.innerHTML, f()); 

Case 2:
assertEquals(tempDiv.innerHTML, g()); 


I guess "id" is a bad attribute to choose for this example due to it's special
nature and implementation, but any other suitable attribute might suffice.

Discussion:
As per the spec, no normalization is mandatory before serialization to
.innerHTML. (See references and links below for further info) Therefore, it is
valid for both Case 1 and Case 2 to fail in a conforming browser. However, due
to the drive to use the DOM parsing algorithm to generate instances of type
HTMLElement and then serializing them, it is very likely that Case 1 will pass
but Case 2 won't because Case 1 would have the id attribute value simply
replaced while Case 2 would do a remove and an append to the attribute list,
potentially causing a reordering of the attribute values and thus changing the
rendered .innerHTML output.

Motivation:
The motivation for filing this bug is not pedantic speculation but a real world
test case. I have significant amounts of test code of the form:
HTML: 
<body>
<... large complex DOM ...>
</body>

JS:
testXDoesY() {
   X(a,b,c,d,e);
   assertHTMLEquals("Expected DOM Structure", document.body);
}

testXDoesZ() {
   X(b,c,d,e,f);
   assertHTMLEquals("Expected DOM Strucuture", document.body);
}

Notes:
The assertHTMLEquals function here is the one implemented by the Closure
library here:
http://closure-library.googlecode.com/svn-history/r27/trunk/closure/goog/docs/closure_goog_testing_asserts.js.source.html#line465

This is the JUnit documentation of the function:
http://www.jsunit.net/jsdoc/GLOBALS.html#!s!assertHTMLEquals
Note that the definition of "standardizing" is insufficient because of the
issues pointed out above.


These tests work fine when run using a server side JS container that spoofs
browser manipulation of the DOM in a deterministic manner but they fail when
run on real browsers using Webdriver tests because FF and Chrome render
.innerHTML with the attributes in different orders and they are in conformance
with the spec when doing so. This unfortunate reality breaks .innerHTML as a
means of writing JS tests that don't depend on implementation but just on state
transformations on the DOM. The use of the DOM API to validate the HTML
structure is extremely cumbersome because that would imply writing hundreds of
brittle asserts (one for each element, attribute, text node etc.) which would
make the tests really opaque to someone reading them.  An obvious workaround
would be to implement an HTML(5) parser in JS to parse the output of .innerHTML
in both cases and then validate the two with the attribute order ignored but
that is exactly against the intent of the DOM Parsing spec (since it has
version skew and doesn't support any of the browser goodies and it is extremely
heavyweight). Another alternative would be to create a Document instance, use
loadXML and then walk the tree and for each node collect the attributes, sort
and compare them but this is ugly, works only for XHTML and does the same work
twice for each assert: DOM -> String -> PseudoDOM and IMHO should not be
encouraged. See [3] for the kinds of hacks to be done for supporting this type
of functionality in IE.


Suggested wording:
The Element specification [1] should require the following:
(a) The list of attributes on an Element is an _unordered, indexed_ list.
(b) When rendering the .innerHTML/.outerHTML string, two distinct DOM elements
with the same structure and attributes MUST render to the same .innerHTML
string (with a deterministic ordering of attributes)

(a) is a required fix in wording to reflect the nature of the API since
Element.attributes only requires the ability to be indexed and does not specify
order as per the current spec.
(b) addresses the issue of determinism in the output of .innerHTML. The
resultant consequences are discussed in Trade Offs.

Trade Offs

* Use of an ordered list at all times requires a maintenance cost to be paid at
parse time. However, setAttribute(...), hasAttribute(...) and getAttribute(...)
become O(log N) functions instead of the O(N) functions mandated today by the
spec. However, given the skew of the attributes actually accessed by JS and the
attributes parsed during page load, this is unlikely to be an exciting
prospect.

* Re-ordering the attributes lazily during the evaluation of .innerHTML.
Suggested wording (a) allows the UA to reorder the attributes when .innerHTML
is accessed, but that breaks for loops iterating over attributes of an element
and then accessing .innerHTML.
eg.
for (var i = 0; i < a.attributes.length; i++) {
   a.innerHTML;
}

* The third and IMHO the better option is to take a slight performance hit
while rendering .innerHTML by actually sorting the attributes before appending
them to the string. This retains the desirable property of determinism in the
output in a cross platform manner without imposing too much of a performance
penalty. 


Related information that I found useful while composing this bug report:
* The resolution of https://www.w3.org/Bugs/Public/show_bug.cgi?id=11204
required that .innerHTML and .outerHTML be built on parsing the DOM Parsing
Algorithms defined here: http://html5.org/specs/dom-parsing.html

* The actual serialization algorithm is defined here:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm
"While the exact order of attributes is UA-defined, and may depend on factors
such as the order that the attributes were given in the original markup, the
sort order must be stable, such that consecutive invocations of this algorithm
serialize an element's attributes in the same order."
(Note: There is no mention of other elements with the same DOM structure being
serialized to the same string.)

* The definition of Element that indicates that the attribute list must be
ordered:
http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
"Elements also have an ordered attribute list. Unless explicitly given when an
element is created, its attribute list is empty. An element has an attribute A
if A is in its attribute list."

* The IDL description of Element:
http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm

* The DOMConfiguration object's canonicalize definition:
http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/DOM3-Core.html#core-DOMConfiguration
"Canonicalize the document according to the rules specified in [Canonical XML].
Note that this is limited to what can be represented in the DOM. In particular,
there is no way to specify the order of the attributes in the DOM."

References:
[1]
http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute
[2]
http://stackoverflow.com/questions/1591841/how-do-i-get-html-attribute-order-to-be-consistent-when-testing-in-javascript
[3]
http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer

--
Apologies for the length of the bug report. Also, this is my first time at
filing bugs with the W3C so if I can do something better for the next time,
please do let me know.

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are on the CC list for the bug.

Received on Saturday, 26 May 2012 01:02:22 UTC