- From: <bugzilla@jessica.w3.org>
- Date: Fri, 25 May 2012 07:58:35 +0000
- To: public-html-bugzilla@w3.org
https://www.w3.org/Bugs/Public/show_bug.cgi?id=17176 Summary: Element attributes should not be required to be stored in an ordered list, .innerHTML remains unspecified Product: HTML WG Version: unspecified Platform: All OS/Version: All Status: NEW Severity: normal Priority: P2 Component: HTML5 spec (editor: Ian Hickson) AssignedTo: ian@hixie.ch ReportedBy: divye_kapoor@hotmail.com QAContact: public-html-bugzilla@w3.org CC: mike@w3.org, public-html-wg-issue-tracking@w3.org, public-html@w3.org Summary: The Element specification [1] requires that the list of attributes be an _ordered_ list. However, this poses multiple issues with regards to the treatment of .innerHTML and .outerHTML, cross browser compatibility, anecdotal user expectations [2][3] and potentially performance. Specifically, the specification leaves undefined the algorithm used for ordering the keys in this ordered list. This causes the output of .innerHTML, .outerHTML to be browser dependent which in and of itself isn't a bad thing, but it causes a type of non-determinism that prevents writing a cross browser assertHTMLEquals(...) function in unit testing frameworks very cumbersome. (Details are mentioned below) Minimal Test Case: HTML: <div id="real"><a id="foo" class="blah" href="#">link</a></div> <div id="temp"></div> JS: function f() { var anchorElement = document.getElementById('foo'); anchorElement.id = 'foochanged'; return document.getElementById('real').innerHTML; } function g() { var anchorElement = document.getElementById('foo'); anchorElement.removeAttribute('id'); anchorElement.setAttribute('id', 'foochanged'); return document.getElementById('real').innerHTML; } var tempDiv = document.getElementById('temp'); var expectedHTML = '<a id="foochanged" class="blah" href="#">link</a>'; tempDiv.innerHTML = expectedHTML; Case1: assertEquals(tempDiv.innerHTML, f()); Case 2: assertEquals(tempDiv.innerHTML, g()); I guess "id" is a bad attribute to choose for this example due to it's special nature and implementation, but any other suitable attribute might suffice. Discussion: As per the spec, no normalization is mandatory before serialization to .innerHTML. (See references and links below for further info) Therefore, it is valid for both Case 1 and Case 2 to fail in a conforming browser. However, due to the drive to use the DOM parsing algorithm to generate instances of type HTMLElement and then serializing them, it is very likely that Case 1 will pass but Case 2 won't because Case 1 would have the id attribute value simply replaced while Case 2 would do a remove and an append to the attribute list, potentially causing a reordering of the attribute values and thus changing the rendered .innerHTML output. Motivation: The motivation for filing this bug is not pedantic speculation but a real world test case. I have significant amounts of test code of the form: HTML: <body> <... large complex DOM ...> </body> JS: testXDoesY() { X(a,b,c,d,e); assertHTMLEquals("Expected DOM Structure", document.body); } testXDoesZ() { X(b,c,d,e,f); assertHTMLEquals("Expected DOM Strucuture", document.body); } Notes: The assertHTMLEquals function here is the one implemented by the Closure library here: http://closure-library.googlecode.com/svn-history/r27/trunk/closure/goog/docs/closure_goog_testing_asserts.js.source.html#line465 This is the JUnit documentation of the function: http://www.jsunit.net/jsdoc/GLOBALS.html#!s!assertHTMLEquals Note that the definition of "standardizing" is insufficient because of the issues pointed out above. These tests work fine when run using a server side JS container that spoofs browser manipulation of the DOM in a deterministic manner but they fail when run on real browsers using Webdriver tests because FF and Chrome render .innerHTML with the attributes in different orders and they are in conformance with the spec when doing so. This unfortunate reality breaks .innerHTML as a means of writing JS tests that don't depend on implementation but just on state transformations on the DOM. The use of the DOM API to validate the HTML structure is extremely cumbersome because that would imply writing hundreds of brittle asserts (one for each element, attribute, text node etc.) which would make the tests really opaque to someone reading them. An obvious workaround would be to implement an HTML(5) parser in JS to parse the output of .innerHTML in both cases and then validate the two with the attribute order ignored but that is exactly against the intent of the DOM Parsing spec (since it has version skew and doesn't support any of the browser goodies and it is extremely heavyweight). Another alternative would be to create a Document instance, use loadXML and then walk the tree and for each node collect the attributes, sort and compare them but this is ugly, works only for XHTML and does the same work twice for each assert: DOM -> String -> PseudoDOM and IMHO should not be encouraged. See [3] for the kinds of hacks to be done for supporting this type of functionality in IE. Suggested wording: The Element specification [1] should require the following: (a) The list of attributes on an Element is an _unordered, indexed_ list. (b) When rendering the .innerHTML/.outerHTML string, two distinct DOM elements with the same structure and attributes MUST render to the same .innerHTML string (with a deterministic ordering of attributes) (a) is a required fix in wording to reflect the nature of the API since Element.attributes only requires the ability to be indexed and does not specify order as per the current spec. (b) addresses the issue of determinism in the output of .innerHTML. The resultant consequences are discussed in Trade Offs. Trade Offs * Use of an ordered list at all times requires a maintenance cost to be paid at parse time. However, setAttribute(...), hasAttribute(...) and getAttribute(...) become O(log N) functions instead of the O(N) functions mandated today by the spec. However, given the skew of the attributes actually accessed by JS and the attributes parsed during page load, this is unlikely to be an exciting prospect. * Re-ordering the attributes lazily during the evaluation of .innerHTML. Suggested wording (a) allows the UA to reorder the attributes when .innerHTML is accessed, but that breaks for loops iterating over attributes of an element and then accessing .innerHTML. eg. for (var i = 0; i < a.attributes.length; i++) { a.innerHTML; } * The third and IMHO the better option is to take a slight performance hit while rendering .innerHTML by actually sorting the attributes before appending them to the string. This retains the desirable property of determinism in the output in a cross platform manner without imposing too much of a performance penalty. Related information that I found useful while composing this bug report: * The resolution of https://www.w3.org/Bugs/Public/show_bug.cgi?id=11204 required that .innerHTML and .outerHTML be built on parsing the DOM Parsing Algorithms defined here: http://html5.org/specs/dom-parsing.html * The actual serialization algorithm is defined here: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm "While the exact order of attributes is UA-defined, and may depend on factors such as the order that the attributes were given in the original markup, the sort order must be stable, such that consecutive invocations of this algorithm serialize an element's attributes in the same order." (Note: There is no mention of other elements with the same DOM structure being serialized to the same string.) * The definition of Element that indicates that the attribute list must be ordered: http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute "Elements also have an ordered attribute list. Unless explicitly given when an element is created, its attribute list is empty. An element has an attribute A if A is in its attribute list." * The IDL description of Element: http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#html-fragment-serialization-algorithm * The DOMConfiguration object's canonicalize definition: http://www.w3.org/TR/2003/WD-DOM-Level-3-Core-20030609/DOM3-Core.html#core-DOMConfiguration "Canonicalize the document according to the rules specified in [Canonical XML]. Note that this is limited to what can be represented in the DOM. In particular, there is no way to specify the order of the attributes in the DOM." References: [1] http://dvcs.w3.org/hg/domcore/raw-file/tip/Overview.html#concept-element-attribute [2] http://stackoverflow.com/questions/1591841/how-do-i-get-html-attribute-order-to-be-consistent-when-testing-in-javascript [3] http://stackoverflow.com/questions/7474710/can-i-load-an-entire-html-document-into-a-document-fragment-in-internet-explorer -- Apologies for the length of the bug report. Also, this is my first time at filing bugs with the W3C so if I can do something better for the next time, please do let me know. -- Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the QA contact for the bug.
Received on Friday, 25 May 2012 07:58:43 UTC