- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Fri, 3 Aug 2007 15:42:45 +0300
- To: Sam Ruby <rubys@us.ibm.com>
- Cc: public-html@w3.org
On Aug 2, 2007, at 18:16, Sam Ruby wrote: > Since the workgroup demands use cases for any proposed new feature, > I will provide one up front: this feature’s use case is to enable > features without use cases. ... > FBML isn’t intended to be directly processed by browsers, but that > shouldn’t preclude it from being processed by other HTML5 tools, > everything from sanitizers to conformance checkers to pretty > printers, to search engines. Is it the assumption that HTML5 so extended would be served on the public network in ways that would routinely expose the extension markup to browsers? If the extensions are intended to be processed by non-browser tools in the context of a walled garden such as Facebook, wouldn't XHTML5 plus namespaced extensions work? > XML permits an alternate syntax, namely default namespaces. In > certain circles, such a syntax is very popular. Regrettably, > allowing such a syntax would pose problems for back level user > agents, and therefore must be disallowed in the HTML5 “custom format”. However, such an approach might well work for bringing specific well- known XML vocabularies with distinct subtrees to the text/html serialization, specifically SVG and MathML with namespace mapping scope established by <svg> and <math> as subtree roots. When it comes to extending text/html to be an alternative infoset serialization for a broader range of possible infosets, I'd prefer to optimize for enabling those two well-known namespaces instead of optimizing for private extensions. (Not to suggest that the two goals were mutually exclusive--just suggesting that well-known vocabularies are preferable oven private vocabularies in a non-walled garden.) > The notion using attributes to define namespaces, and the specific > syntax for declaring same, however, can be directly lifted from > XML. The syntax is xmlns:x in an enclosing scope. Judging from how Namespaces in XML are practiced, a two-to-four- letter prefix seems to reduce the probability of conflicts sufficiently and the indirection to URIs to reduce the probability more is often more of an annoyance than a useful feature. But even though Namespaces in XML are distinctly non-ideal, maintaining mappability to namespace-aware XML is probably more desirable than simplifying the HTML side while breaking mappability. :-/ (I see that your prefix registry proposal alleviates this problem in one way and but makes the predictability of alleviation subject to when the parser's prefix registry snapshot was taken.) It has been vaguely mentioned that Opera has experimented with introducing XML-like namespace syntax to HTML parsing. I think it would be useful to hear what they learned and what obstacles made them back off. I believe there are some "Breaking the Web" issues lurking here. Moreover, it seems that researching the impact of a given colon- related syntax when exposed to IE seems to be of utmost importance. > Messy details > ------------- > > I don’t pretend that these are exhaustive, but they should seed an > interesting set of discussions: * Should the tokenizer do ASCII case folding when scanning a name until it hits a colon (effectively making prefixes ASCII-case- insensitive)? Or should each name be scanned without case folding and case-folded conditionally later? > * The notion of “enclosing element” is problematic in the face > of adoption agency algorithms and the like. The prudent thing to > do is to define any case where reparenting would change the meaning > of any element to be a (recoverable) error. This would affect very > few users or documents. It would be a bitch to code in a > conformance checker, but that’s not the spec’s writer’s concern. :-) Reparenting is already an error. OTOH, making the namespace scoping work in that case is more of a concern for implementations other than conformance checker. (Conformance checkers have the luxury of being able to opt to do fatal errors. I figured that being fatal on reparenting is more user-friendly in the conformance checking case than reparenting and then doing higher-level checks on reparented tree parts and having the errors and the superficial appearance of the source not match at all.) This should probably be addressed in such a way that the namespace scope can piggy-back on the stack that the tree builder maintains per the current algorithm definition--and then accept whatever counter- intuitiveness may arise in error situations. > * You might think that this proposal wouldn’t change how text > nodes or comments were processed, but there is one case that merits > consideration. The default processing by existing user agents is > to render text nodes even when they are enclosed in unknown > markup. In some cases, this may not be desirable. The XML CDATA[] > syntax is treated as a comment by HTML parsers, so this may be used > to “cloak” such text regions. For this to work, however, HTML5 > compliant parsers would have to treat such constructs as text, but > only when enclosed by an extension element. Again, a more > complicated parse state machine is necessary in order to preserve > backwards compatibility and extensibility. I think I don't understand what behavior you are suggesting here. > Implications > ------------ * As an unintended consequence an extension mechanism like this could make it possible to declare the HTML namespace with a prefix and hack different core element treatment in legacy UAs and extension mechanism-aware UAs. (This can be a good thing or a bad thing.) > * This proposal does, however, increases the size of the > profile of XHTML that can be reasonably handled by HTML5 parsers. > I or others could voluntarily chose to restrict ourselves to that > profile, but are not compelled to do so. In my case, a typical > page would only increase in size by at most a few dozen bytes to > conform. I still think that you and your site are special cases and what works for you may not work for the masses of the ViewSourceClan[1] who don't have sufficient spec lawyering proficiency. > * To make HTLM5 more robust, it may make sense to define a > central registry of default prefixes. This would likely be > controversial, but would effectively address a common problem. > Such prefixes would, of course, be overridable in any document; the > intent of this is to handle the case where somebody copy/pastes a > document fragment without the enclosing namespace declaration. Makes sense. However, the barrier for registration should be really, really low for this to work. (As in much, much lower than for registering anything with the IANA.) [1] http://intertwingly.net/wiki/pie/ViewSourceClan (for the benefit of other list members) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Friday, 3 August 2007 12:42:59 UTC