- From: Calogero Alex Baldacchino <alex.baldacchino@email.it>
- Date: Wed, 03 Dec 2008 22:01:43 +0100
Ian Hickson ha scritto: > On Wed, 3 Dec 2008, Calogero Alex Baldacchino wrote: > >> But, isn't it worth to spend a word everywhere in the spec to tell when >> it's a quirck for backward compatibility, which might go away in the >> future, and when it's not, because that's not needed? >> > > None of the implementation requirements in HTML5 will go away in the > future. We will always have to define how implementation are to handle all > inputs, today, tomorrow, and 100 years from now. Authors aren't going to > stop writing invalid documents, unfortunately; and even if they did, the > documents that exist today aren't going anywhere. (One of the goals of the > HTML5 project is to document how someone in 2100 AD, or even 21000 AD, > should handle Web pages of today, so that today's heritage isn't lost.) > > > Ok, and agreed. Due to the nature of the web (and of web authors' practices), a strict conformance requirement (such as it might be for a C compiler) will never be a good idea. >> I mean, if you allow spacing characters inside an id value, as a parsing rule, >> you can face something like '<div id="foo bar" >', that is an id consisting of >> more than one token. Is it good to leave it in untouched? Yes? Ok, but what >> does it mean for CSS's, since there is a reference to them as one reason to >> allow space characters? That is, can a browser handle an id selector starting >> with the '#' character and being broken by a blank space? >> > > Sure: > > #foo\ bar { ... } > > ...would match an element with id="foo bar". > > > Right, now I remember... sorry for my mess... >> Now, let's say, instead, that a user agent, conforming with HTML 5 >> specifications, must cut off any token after the first one (I know >> actually "foo bar" is taken as is), that is <div id="foo bar"> becomes >> <div id="foo "> and <div id=" foo "> is valid too. In such a case, >> skipping any spaces too, and stating the same behaviour for strings >> passed to .getElementById() could be nice as a graceful degradation for >> documents non-conforming with the rule "the value [of an id attribute] >> must not contain any space characters", but such might fail with CSS >> selectors such as 'div[id="foo bar"]'. >> > > I don't follow you there. What problem are you trying to solve? > > Just trying to explain why I was suggesting such a behaviour (= stripping space characters) in my first message about that. I was wrongly ignoring the case of id="foo bar" and just concerning on id=" foo ", but not confusing authoring and parsing rules (even if I admit sometimes I've strict conformance in mind). If the latter were the only "naughty boy" out there, perhaps stripping spaces might have had some sense (though not the best choice without touching other things maybe out of scope). > >> Perhaps a compromise, if acceptable for backward compatibility, might be: >> - when the id value must be compared to a fragment identifier, strip any >> trailing space characters; if the match fails, escape any other space >> characters both in the id value and in the fragid and try again; >> > > Why not just do what we do now, and treat the attribute as-is? > > > >> - when an attribute is defined to hold an url and its value has spaces in its >> path/query/fragment, escape them before resolving the url (not sure if >> needed); >> > > Again, aren't the current rules for handling URLs as defined in HTML5 > enough? > > > Maybe the first is wrong, and I'm still unsure of the second. My concern is, a character-by-character comparison between an id value and a fragment identifier may fail several ways. What for href="#foo bar " and id="foo bar "? Actual rules would strip the trailing space only for the href, so the matching would fail (but we might survive broken links). Escaping both, then comparing would succed, as well as first escaping then unescaping the href value before comparing (should it be pointed out, somewhere, that a fragment identifier must be unescaped before comparing to an id or a name? is it and I've missed it? - having space characters in the unreserved production means thy don't need to be escaped, but does it mean also they must be decoded from their pct-production, after parsing and for resolving?). As well, stripping the trailing spaces in both cases would succed, but would fail when comparing id="foo bar " with href="#foo bar%20" (which is a valid url, according with actual parsing rules), even with escaping rules (in this case the id value trailing space must stay there). And what about id="foo%20bar" in http://foo.example.org/foo.html and href="#foo bar" on the same page, or on a page having the same base URL, or a base element with href="http://foo.example.org/foo.html" ? My point is, since comparisons for matching purpose happen after the URL parsing and resolution, and the id value is not involved in such steps, character-by-character comparisons may fail without a prior normalization of both th fragment-identifier an the id value (or one of them). However, if the above is yet solved with parsing and resolving rules and I've misunderstood the spec, I retire all and apologize. Or, perhaps, must a valid url with a valid fragment, which is equivalent but not exactly matching an id value, be considered as a broken link? > > >> Anyway, if the id value is also a fragment identifier, which might have >> space characters (since parsing rules prescribe to add such characters >> to the unreserved production), does the (authoring) rule "the value must >> not contain any space characters" make sense? >> > > Sure, why wouldn't it make sense? If IDs have spaces in them, you can't > refer to them from space-separated lists of IDs, so to avoid authoring > problems, authors will want to be told when they acidentally use spaces. > > > I'll try and make that point a bit clearer, since the reference to url parsing rules was wrong - the question is another. That's because of the double nature of the id attribute as both an ID and a fragment identifier: according to RFC 3986, unless I have misunderstood anything there, after dividing an URI into its component, pct-triplets may be safely decoded (and should be to correctly interpret each component), thus "%20foo%20bar%20" and " foo bar " are equivalent and both valid as conforming dereferenced <fragment-identifier> components (while only the former is conforming as a part of a complete URI, since for rfc3986 spaces are not 'unreserved'), but the latter is a non conforming ID according to the rule "an id value must not contain any space characters", which is a somewhat restriction to the fragment-identifier conformance. As far as conforming user agents leave it as is, that's not a concern; anyway, formally, is it something to be solved or pointed out somehow, in the spec? When a validator/an authoring tool finds something like, <!-- The following section is a review of Los Angeles inside an article about California - just to create a context for the example --> <section id="Los Angeles" > ... </section> shall it only report the id value as mistaken, or has it to say also it's a valid fragment identifier if the author is setting the id as an anchor? > > What terminology would you prefer rather than "subtree"? (We can't say > document, since we are also trying to define conformance rules for > disconnected subtrees handled from scripts.) > > > Uhm, it may depend on what kinds of manipulations you have in mind, whether the disconnected subtree must be anyway a whole document to fulfil the uniqueness rule, and perhaps also on what the subtree concept might be turned into by future DOM Core versions, so maybe just a clarification on what a subtree is with respect to both the document (as a tree) and the scripts handling possibilities might be enough, instead of searching a new terminology, just to 'scope' the id visibility. I mean, if the ID matching is relevant for scripts accessing the matching element through the getElementById() method, actually a document tree is always overlapping the concept of subtree, and a disconnected subtree must be a document without a browsing context; otherwise, if other dom manipulations are involved the concept of subtree may change, for instance a script might implement its own scanning routine, treating an id attribute as any other attribute and leading to the concept that any non-leaf node may be the root of a subtree (that is identifying a subtree with any possible document fragment); furthermore, a possible future version of DOM Core interfaces might move the getElementById method to the Node interface, leading to the same result. Thus, a generic definition of 'subtree' (or no definition, or a definition relying upon a specific DOM feature or on script handling) might result in a variable concept with a variable scope for the ID uniqueness, but might make sense in a working draft until at least a first definition of the Web DOM Core specification, or waiting for any reason arising to restrict or enlarge the concept; otherwise, if that's been stated with a large consensus that a subtree is always a document tree, the term might be changed into the expression "a document, with or without a browsing context", or (equivalently) be defined as "a document subtree having a node of type document as its root" (to cover the case of dynamically created documents). Otherwise, if a subtree can be either a whole document, or a document subtree detached from its owner document (i.e. a node removed from a document with its descendants, or a tree of nodes whose ownerDocument property is not defined or null), it might be defined just as such, leaving the term 'subtree' wherever it is now (but would such a manipulation be consistent with the - authoring - uniqueness rule when the subtree is inserted into an actual document?). > The getElementById() method will be defined more precisely than the vague > wording in the DOM specs. I believe Simon Pieters is working on that. > > > I acknowledge this. > CSS doesn't search for a single match for IDs, it just looks for whether > an element matches the selector or not. So it doesn't care if there are > duplicates. But anyway, CSS is out of scope for this mailing list. > > I agree, and just wondered whether it may or may not be a concern for consistent manipulation through both the DOM and the CSS, but I can't focus a concrete example where such a concern might arise, not being a side effect of a bad programming out of scope for both CSS and DOM, and I also acknowledge that might be in the scope of Web DOM Core, since it's been established it's out of scope for HTML specific DOM (which doesn't define any basic elements properties and access methods, but just html-specifically targeted ones, and I found this is consistent with the choice to define some stand-alone interfaces instead of always inheriting from the basic counterparts). -- Caselle da 1GB, trasmetti allegati fino a 3GB e in piu' IMAP, POP3 e SMTP autenticato? GRATIS solo con Email.it http://www.email.it/f Sponsor: Attiva Carta Eureka! Credito fino a 3.000?, rate da 20? e zero costi di attivazione. Conviene! Clicca qui: http://adv.email.it/cgi-bin/foclick.cgi?mid=8429&d=3-12
Received on Wednesday, 3 December 2008 13:01:43 UTC