[whatwg] DOMTokenList is unordered but yet requires sorting from Ian Hickson on 2009-07-28 (public-whatwg-archive@w3.org from July 2009)

From: Ian Hickson <ian@hixie.ch>
Date: Tue, 28 Jul 2009 00:17:14 +0000 (UTC)
Message-ID: <Pine.LNX.4.62.0907280007270.23663@hixie.dreamhostps.com>
On Sun, 12 Jul 2009, Jonas Sicking wrote:
> >>
> >> Oh, I have forseen that. Is it really necessary to remove duplicates 
> >> ? I imagine DOMTokenList to be similar to what can be achieved with a 
> >> String.split(), but then it would be just more duplicate 
> >> functionality.
> >
> > If we don't remove duplicates, then things like the .toggle() method 
> > could have some quite weird effects.
> 
> Such as?

Such as .length changing by more than 1 after a call to .toggle().


> I definitely think it'd be worth avoiding the code complexity and perf 
> hit of having the implementation remove duplicates if they appear in the 
> class attribute given how extremely rare duplicates are.

Fair enough. I've made DOMTokenList not remove duplicates.


On Mon, 13 Jul 2009, Sylvain wrote:
> 
> This is a bit unrelated, but when looking at the DOMTokenList 
> implementation, I had an idea about an alternative algorithm that could 
> be easier to implement and could also be described more simply in the 
> spec. The disadvantage is that the DOMTokenList methods mutating the 
> underlying string wouldn't preserve existing whitespace (which the 
> current algorithms try hard to do).
> 
> The idea is that any DOMTokenList method that mutates the underlying string
> would do:
>  - split the attribute in unique tokens (preserving order).
>  - add or remove the token according to the method called.
>  - rebuild the attribute string by concatenating tokens together (with a
> single space).
> 
> At first, this may look like inefficient (if implemented naively).
> But I guess that implementations will usually keep both the attribute string
> and a list of tokens in memory, so they wouldn't have to tokenize the string
> on every mutation. There is a small performance hit during attribute
> tokenization: the list of tokens would need to keep only unique tokens. But
> after that, the DOMTokenList methods are very simple: length/item() don't need
> to take care of duplicates, add/remove/toggle are simple list manipulation
> (the attribute string could be lazily generated from the token list when
> needed).
> 
> To summarize:
> pros: simpler spec algorithms, simpler implementation
> cons: less whitespace preservation, small perf hit during tokenization
> 
> I don't know if I'm missing something. Does this sound reasonable?

It ends up being not much simpler since you still have to deal with direct 
changes to the underlying string, as far as I can tell.


On Mon, 13 Jul 2009, Jonas Sicking wrote:
> 
> I do agree that the spec seems to go extraordinary far to not touch 
> whitespace. Normalizing whitespace when parsing is a bad idea, but once 
> the user modifies the DOMTokenList, I don't see a lot of value in 
> maintaining whitespace exactly as it was.
> 
> Ian: What is the reason for the fairly complicated code to deal with 
> removals? At least in Gecko it would be much simpler to just regenerate 
> the string completely. That way generating the string-value could just 
> be dropped on modifications, and regenerated lazily when requested.

In general, I try to be as conservative as possible in making changes to 
the DOM. Are the algorithms really as complicated as you're making out? 
They seem pretty trivial to me.

-- 
Ian Hickson               U+1047E                )\._.,--....,'``.    fL
http://ln.hixie.ch/       U+263A                /,   _.. \   _\  ;`._ ,.
Things that are impossible just take longer.   `._.-(,_..'--(,_..'`-.;.'
Received on Monday, 27 July 2009 17:17:14 UTC