[whatwg] Use cases for Node.getElementById from Simon Pieters on 2008-12-08 (public-whatwg-archive@w3.org from December 2008)

From: Simon Pieters <simonp@opera.com>
Date: Mon, 08 Dec 2008 07:11:41 +0100
Message-ID: <op.ultx5rlkidj3kv@zcorpandell.linkoping.osa>
On Sun, 07 Dec 2008 04:09:01 +0100, Calogero Alex Baldacchino  
<alex.baldacchino at email.it> wrote:

> I'm reading it :-)
>
> And I have a few questions. First, is it meant as the reference DOM Core  
> for HTML 5 only, or in general (for other kinds of markup too)?

In general.


> The 'children' attribute on the Element interface, being an  
> HTMLCollection instance, suggests me the former might be the answer;  
> otherwise, either the reference to a specific document DOM interface, or  
> (in the case such interface were moved into Web DOM Core) the reference  
> to a specific dom in the name of the interface might perhaps be  
> problematic (formally, at least).

Is it the name "HTMLCollection" that is the problem?


> I guess such attribute has been declared on the Element interface  
> instead of the HTMLElement one because actually this is the most common  
> implementation in current browsers.

Right. Also because it seems useful for not just HTML.


> Anyway, let me suggest (just as a hint, after all a working draft is the  
> right phase to explore any alternative) something like an  
> ElementCollection interface with the same properties of HTMLCollection,  
> making the latter just inheriting from the former as if it were an alias  
> (the same way DocumentFragment inherits from Node). On any browser  
> implementing 'children' as an HTMLCollection (without any hierarchy),  
> this shouldn't be a problem for scripts, since a script language usually  
> provides runtime inferred types; for languages with strong types (and  
> perhaps here we're moving from scripts to plugins), the access strategy  
> may be implementation specific but, as far as the hierarchy of  
> interfaces (ElementCollection -> HTMLCollection) does not change the  
> properties of an object implementing the HTMLCollection, that shouldn't  
> be a lot to work around. For instance, a Java applet (as well as any  
> other object implementing LiveConnect) should work fine using the  
> JSObject without any modify, while a direct access to the DOM would need  
> a DOMServiceProvider implementation (I'm not aware of any granting  
> access to the 'children' attribute, or better, to any non-W3C DOM  
> properties, but I guess as soon as your proposal became a recommendation  
> at least Sun would update such in Java APIs); for such purpose,  
> suggesting that any object provided by the user agent as implementing  
> either interface should be wrapped by an object also implementing the  
> other, for backward compatibility, might be enough (anyway, this is no  
> more than a hint, a very early feedback).

This seems like adding complexity for political reasons.


> I see the Element interface no more contains methods to handle Attr  
> nodes: since those are described as not being child nodes of an Element,  
> in W3C specifications, there will be any other way to handle attributes  
> as nodes, the 'nature' of Attr nodes is going to change, or is there a  
> too little use (and/or support) of them, such that the Attr interface  
> might be quite close to its 'end of life'?

I'm not sure what to do with attributes. I'd like to drop support for  
attribute nodes (being moved around, etc), if possible, but keep the  
.attributes list and be able to use .value etc on each attribute.


> Apart from that, I've also noted the 'isId' attribute has been removed  
> from Attr;

Right, it hasn't been implemented in the top 4 browsers and it seems like  
a not-so-useful feature to have.


> I was thinking just to that when I've read, in HTML 5 spec, that "This  
> specification doesn't preclude an element having multiple IDs, if other  
> mechanisms (e.g. DOM Core methods) can set an element's ID in a way that  
> doesn't conflict with the id attribute".

It says this, AIUI, because other specs do make it possible, not because  
it's a good idea that it is possible. Personally I think it should not be  
possible (specifically I think 'id' should be like 'xml:id' is and all  
other ways to get an ID-like attribute should be dropped).


> For this purpose, either the 'isId' property of an Attr node, or a  
> mechanism to set an Element's attribute as an alternative ID (or both)  
> might be helpful (anyway, having more then one unique identifier to  
> handle for each element|| in a document might cause an increase in  
> duplicated IDs).

It's not clear to me why it would be helpful.


> The above takes me to the '.getElementsByClassName()' method: if it were  
> to be moved from HTML 5 spec to Web DOM Core API, and if the latter is  
> meant as some kind of replacement for W3C DOM level 3, perhaps, for  
> generality sake, such method might be defined as referring to a property  
> named CLASS (along the same lines as ID), pointing out that such  
> property might not be binded to an attribute named 'class' (just to make  
> the spec ready in case the need to support such sort of document arose  
> in the near future, without having to change web dom core, or to derive  
> a new version, only for this reason).

That's how it's defined in HTML5 already.


> But now let's come to your questions (sorry for the digression,  
> sometimes I can't help starting this way...)
>
>>
>>> But the term 'Subtree' arises a problem with HTML 5: actually, the id  
>>> attribute is defined as the element unique ID in the *subtree* whithin  
>>> which the element is found. That is, the term subtree refers to a  
>>> whole document tree, but also to a disconnected subtree handled by a  
>>> script (and I haven't yet understood if such definition refers to a  
>>> document fragment containing nodes detached by any document, or a  
>>> whole document without a browsing context).
>>
>> AIUI, it could also be a disconnected element.
>>
>
> And I've suggested, in another mail, to clarify it, i.e. telling a  
> subtree is either a whole document (to make clearer that '<body><div  
> id="the_id" >....</div> <div> ... <div id="the_id" /> </div> </body>' is  
> always illegal, regardless the possibility to separate two different  
> subtrees in the same document where the id 'the_id' is unique),

AIUI, the 'subtree' in HTML5 means the tree from the top-most ancestor.  
The spec could be clearer about this.


> or a group of one or more 'related' disconnected nodes (i.e. a node  
> removed from a document with its descendants, a cloned node, a newly  
> created one, and so on, that is, in practice, about any tree of nodes  
> without an ownerDocument).

Don't nodes always have an ownerDocument?


> However, how is the ID uniqueness to be related to current APIs with  
> respect to a disconnected subtree? I mean, if such were relevant for any  
> method, such as a .getElementById variant handling disconnected  
> elements, the "uniqueness rule" would be quite self-explaining: either  
> such a method fails in front of duplicated IDs, or it returns the very  
> first match (or does whatever else is established to degrade  
> gracefully). But if the subtree traversal/id searching implementation is  
> up to a script programmer, the 'id' attribute may be seen as any other  
> attribute, and the programmer may opt for considering the uniqueness of  
> IDs as non relevant to his/her code in 'off-document' manipulations,  
> until the subtree is inserted into a document. Thus, maybe, IDs  
> uniqueness should be related to anything being under direct control of  
> current APIs (such as documents).

Are you arguing that HTML5 should remove the requirement about unique IDs  
for nodes not in a document?


> For the rest, my considerations on [whatever].getElementById() were  
> general, not referred to a concrete use case (partly being driven by the  
> above thoughts),

Ok.


> and, for what concerns the email you've replied, started from the  
> following:
>
> On 12/2/2008 4:07 PM, timeless wrote:
>> On Tue, Dec 2, 2008 at 10:39 AM, Aaron Leventhal<aaronlev at moonset.net>   
>> wrote:
>>
>>> Maybe there is a deeper problem if copy&  paste doesn't work right  
>>> because
>>> of IDs?
>>>
>>> Or maybe there should be a node.getDescendantById() method?
>>>
>>
>> maybe, but not with that name.
>>
>>   Results 1 - 10 of about 4,480,000 for Descendent [definition]. (0.22  
>> seconds)
>>   Results 1 - 10 of about 8,370,000 for Descendant [definition]. (0.41  
>> seconds)
>>
>> the wikipedia links are confusing enough
>>
>> http://en.wikipedia.org/wiki/Descendant links to:
>> http://en.wiktionary.org/wiki/descendent
>> which has an also link to http://en.wiktionary.org/wiki/descendant
>> which has a 'US' audio file
>>
>> So the web says that '-dant' is favored 2:1 over '-dent', which is a
>> fairly bad margin considering the spelling errors we've seen in
>> html/http.
>>
>> I'd sooner see Node.getElementById and risk *the confusion of it
>> returning fewer nodes than Document.getElementById.*
>>
>>
>
> I guessed the confusion he was concerning might be caused by the general  
> confidence people have with .getElementById working on the whole  
> document, so that someone might think, after discovering somehow there  
> is a getElementById method on every Node (or better, on every element in  
> an HTML document), such being working on the whole document as well,  
> thus expecting more non null results than a Node.getElementById method  
> might return, in general (people anyway should be confident with  
> Document.getElementsByTagName working on the whole document while  
> Element.getElementsByTagName working on an element descendants... but  
> without a crystal ball to guess it, who knows...). Thus, if the above  
> were the case, I've suggested something like  
> document.getElementById("the_id", startNode), to focus (perhaps!) the  
> attention of the average programmer (the same who might have thought  
> 'anElement.getElementById' being the same as 'document.getElementById')  
> on the different behaviour, by 'forcing' the indication of a new  
> argument specifying where to start the search (that is, something  
> totally new to the programmer).
>
> I agree that encouraging duplicate IDs is not any good idea, and  
> whatever method looking for IDs in a document subtree might take to  
> that. I really don't like the bare idea of a duplicated ID -- it's a  
> somewhat conflicting logic -- and I'm tempted to say a duplicate should  
> always break correct execution, especially when the duplicate comes from  
> an 'error' in scripting (as when inserting a cloned node without caring  
> of side-effects), because, at first glance, I'd regard a duplicate ID as  
> whatever else bug making a stand-alone application crashing because of  
> side-effects (of course, I'm not telling the browser should crash!),  
> and, after all, a gracefully degrading .getElementById (returning the  
> very first matching element) cannot prevent side-effects. But I really  
> understand, also, duplicate IDs are a common problem asking for a  
> solution, or at least a standard graceful degradation (such as the one  
> stated for document.getElementById(elementId)).
>
>  From this point of view, and also having in mind a disconnected subtree  
> to deal with through the API, maybe something like  
> document.getElementById(elementId, rootElement) might make sense if it  
> worked _only_ on disconnected subtrees, by first checking whether the  
> passed node (or the first descendent not being a document fragment node)  
> is effectively disconnected (not in the document), then returning the  
> matching element (if any), or null if the traversed subtree is instead  
> in the document tree (or even ending with an exception, to encourage a  
> greater attention and immediately tell the reason of the failure --  
> returning null might be ambiguous, since that's the same result when no  
> matching element is found).
>
> But such method might be confusing, I agree (perhaps might confusion be  
> avoided by calling the method getExternalElementById?), and might also  
> provide a quite easy way to look for duplicate IDs (e.g. by creating a  
> fake document and calling that method on a subtree of another,  
> manipulated document -- but such would be a bit tricky, and if I do  
> something tricky, I should know that's not the better way to achieve  
> some result).

I guess the best way to avoid confusion is to not add the feature at all.  
:-)


> ---
>
> Now, let me fly back again off topic on Web DOM Core. Sometime elements  
> are conceived to be somewhat leaf elements, conceptually not being  
> enabled to have any sort of descendants (this is the case for <br> or  
> <base>, i.e., in HTML); yet they're Elements, and Nodes (and  
> HTMLElements), thus having attributes and methods allowing, in theory,  
> some weird manipulation.

Right. <br> can have children by using XHTML or by DOM manipulation.


> I don't know what every browser currently does (honestly, I've never  
> tried -- this is a recent doubt of mine),

Opera, Firefox and Safari don't throw. IE does.


> but I guess the result might vary from one UA to another, and something  
> inappropriate might happen (in any case, there is no standard way to  
> block such). Perhaps, a new (readonly) property might be defined on Node  
> telling the node is a "definitive leaf", so that whatever  
> property/method might be declared as inaccessible/failing (e.g. throwing  
> an exception) if the 'isLeaf' property were true -- such an attribute  
> cannot be neither an Element, nor an HTMLElement, nor an HTMLElement  
> derived interface property, since there are methods on the Node  
> interface involved too, and as well the nodeType attribute wouldn't cope  
> well with this, because that should work for instances of the same type,  
> or regardless the type -- in other words, when trying and changing the  
> descendant hierarchy of a node with a true value for such an attribute,  
> the result should be an illegal hierarchy. Do you think such might be  
> worth to be considered?

I don't understand why it would be useful.


> (before I forget it, current definition of legal hierarchy seems to cut  
> out some legal cases, such as, for instance, an Element with no Text  
> child nodes, and the alike -- perhaps, should it be converted into a  
> list of illegal hierarchies? the resulting list would cover fewer cases  
> than a complete list of legal scenarios - if I haven't misunderstood it).

Um. Yeah. Actually it rules out everything (a node can't be a Document and  
a Text node at the same time, for instance). I think i'll move the  
checking into each algorithm that's adding stuff to the tree instead (in  
due course).


Thanks for the feedback,
-- 
Simon Pieters
Opera Software
Received on Sunday, 7 December 2008 22:11:41 UTC