- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 19 Feb 2009 01:41:38 +0000 (UTC)
On Wed, 3 Dec 2008, Calogero Alex Baldacchino wrote: > > My concern is, a character-by-character comparison between an id value > and a fragment identifier may fail several ways. What for href="#foo bar " > and id="foo bar "? Actual rules would strip the trailing space only > for the href, so the matching would fail (but we might survive broken > links). Escaping both, then comparing would succed, as well as first > escaping then unescaping the href value before comparing (should it be > pointed out, somewhere, that a fragment identifier must be unescaped > before comparing to an id or a name? is it and I've missed it? - having > space characters in the unreserved production means thy don't need to be > escaped, but does it mean also they must be decoded from their > pct-production, after parsing and for resolving?). The behavior specced now may change, but as it stands now unescaping is defined for fragment-identifier-to-id="" matching. In general, though, the behaviour is constrained by what IE does and more to the point by what is needed by content that depends on what IE does. (You sent another couple of e-mails on the topic; I understand -- mostly -- the points you make therein, and would like to refer you to the recent thread on the topic: http://lists.w3.org/Archives/Public/public-html/2009Feb/thread.html#msg407 ...where the same issues were discussed with more concrete reference to actual implementations and constraints placed on us by legacy content.) > > What terminology would you prefer rather than "subtree"? (We can't say > > document, since we are also trying to define conformance rules for > > disconnected subtrees handled from scripts.) > > Uhm, it may depend on what kinds of manipulations you have in mind, whether > the disconnected subtree must be anyway a whole document to fulfil the > uniqueness rule, and perhaps also on what the subtree concept might be turned > into by future DOM Core versions, so maybe just a clarification on what a > subtree is with respect to both the document (as a tree) and the scripts > handling possibilities might be enough, instead of searching a new > terminology, just to 'scope' the id visibility. I mean, if the ID matching is > relevant for scripts accessing the matching element through the > getElementById() method, actually a document tree is always overlapping the > concept of subtree, and a disconnected subtree must be a document without a > browsing context; otherwise, if other dom manipulations are involved the > concept of subtree may change, for instance a script might implement its own > scanning routine, treating an id attribute as any other attribute and leading > to the concept that any non-leaf node may be the root of a subtree (that is > identifying a subtree with any possible document fragment); furthermore, a > possible future version of DOM Core interfaces might move the getElementById > method to the Node interface, leading to the same result. Thus, a generic > definition of 'subtree' (or no definition, or a definition relying upon a > specific DOM feature or on script handling) might result in a variable concept > with a variable scope for the ID uniqueness, but might make sense in a working > draft until at least a first definition of the Web DOM Core specification, or > waiting for any reason arising to restrict or enlarge the concept; otherwise, > if that's been stated with a large consensus that a subtree is always a > document tree, the term might be changed into the expression "a document, with > or without a browsing context", or (equivalently) be defined as "a document > subtree having a node of type document as its root" (to cover the case of > dynamically created documents). Otherwise, if a subtree can be either a whole > document, or a document subtree detached from its owner document (i.e. a node > removed from a document with its descendants, or a tree of nodes whose > ownerDocument property is not defined or null), it might be defined just as > such, leaving the term 'subtree' wherever it is now (but would such a > manipulation be consistent with the - authoring - uniqueness rule when the > subtree is inserted into an actual document?). My brain got lost partway through reading the above, so I apologise if I missed a key point you were making. Anyway, the spec now has the term "home subtree", which is defined in more detail than "subtree" was before. I hope this helps. On Sat, 13 Dec 2008, Nils Dagsson Moskopp wrote: > Am Freitag, den 12.12.2008, 20:36 +0100 schrieb Calogero Alex > Baldacchino: > > > > The above (but the 'double check' I was suggesting) is about the way > > Firefox (2.x and 3.0.4) behaves (both href="#foo%20bar" and, in a > > different page, href="./example.html#foo%20bar" match id="foo bar"), > > while IE7 and Opera 9.x perform an exact comparison, and show, in the > > address bar, an url with eventual blank spaces, thus applying the > > relaxation allowed by URL parsing rules, but not conforming to RFC > > 3986, as a complete URI string. > > Whenever I copypaste an URI from the address bar to any other program, I > am severely annoyed by this, especially when spaces (delimiters !) are > part of the fake-URI. A chat or office program, for example, is unable > to highlight the fake-URI anymore, (how could it ?), also pasting it > into source code can create all kind of validation errors. And whenever > I get a bastardized URI via chat or mail, only a part of it is > clickable. > > Can someone from the web browser faction please state if there is any > data to support breaking RFC-compatibility ? Because as I see it, its > something that makes it appear nicer, but breaks whenever URIs are to be > transferred / communicated. Note that pages that rely on this behaviour (either in the linking or the targetting) are non-conforming. There are pages that depend on weird behavior here, as noted in the thread I mentioned near the top of this e-mail, but it may be that we can change the actual rules a bit to handle this better. > Getting to the problem mentioned here, the robustness principle says > that id="foo bar" should be accepted, but nevertheless invalid - because > a fragment with a space can never be part of an URI. So IMHO, any > program should strive to accept broken URIs if they are unambigous > (which they are here, because the address can hold only one URI at a > time), but never output them. Agreed. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Wednesday, 18 February 2009 17:41:38 UTC