Re: Distributed Extensibility from Henri Sivonen on 2007-08-03 (public-html@w3.org from August 2007)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Fri, 3 Aug 2007 15:42:45 +0300
To: Sam Ruby <rubys@us.ibm.com>
Cc: public-html@w3.org
Message-Id: <62FBC3FA-8C71-486E-8C1D-B7E68A0D12E1@iki.fi>
On Aug 2, 2007, at 18:16, Sam Ruby wrote:

> Since the workgroup demands use cases for any proposed new feature,  
> I will provide one up front: this feature’s use case is to enable  
> features without use cases.
...
> FBML isn’t intended to be directly processed by browsers, but that  
> shouldn’t preclude it from being processed by other HTML5 tools,  
> everything from sanitizers to conformance checkers to pretty  
> printers, to search engines.

Is it the assumption that HTML5 so extended would be served on the  
public network in ways that would routinely expose the extension  
markup to browsers? If the extensions are intended to be processed by  
non-browser tools in the context of a walled garden such as Facebook,  
wouldn't XHTML5 plus namespaced extensions work?

> XML permits an alternate syntax, namely default namespaces.  In  
> certain circles, such a syntax is very popular.  Regrettably,  
> allowing such a syntax would pose problems for back level user  
> agents, and therefore must be disallowed in the HTML5 “custom format”.

However, such an approach might well work for bringing specific well- 
known XML vocabularies with distinct subtrees to the text/html  
serialization, specifically SVG and MathML with namespace mapping  
scope established by <svg> and <math> as subtree roots. When it comes  
to extending text/html to be an alternative infoset serialization for  
a broader range of possible infosets, I'd prefer to optimize for  
enabling those two well-known namespaces instead of optimizing for  
private extensions. (Not to suggest that the two goals were mutually  
exclusive--just suggesting that well-known vocabularies are  
preferable oven private vocabularies in a non-walled garden.)

> The notion using attributes to define namespaces, and the specific  
> syntax for declaring same, however, can be directly lifted from  
> XML. The syntax is xmlns:x in an enclosing scope.

Judging from how Namespaces in XML are practiced, a two-to-four- 
letter prefix seems to reduce the probability of conflicts  
sufficiently and the indirection to URIs to reduce the probability  
more is often more of an annoyance than a useful feature. But even  
though Namespaces in XML are distinctly non-ideal, maintaining  
mappability to namespace-aware XML is probably more desirable than  
simplifying the HTML side while breaking mappability. :-/ (I see that  
your prefix registry proposal alleviates this problem in one way and  
but makes the predictability of alleviation subject to when the  
parser's prefix registry snapshot was taken.)

It has been vaguely mentioned that Opera has experimented with  
introducing XML-like namespace syntax to HTML parsing. I think it  
would be useful to hear what they learned and what obstacles made  
them back off. I believe there are some "Breaking the Web" issues  
lurking here.

Moreover, it seems that researching the impact of a given colon- 
related syntax when exposed to IE seems to be of utmost importance.

> Messy details
> -------------
>
> I don’t pretend that these are exhaustive, but they should seed an  
> interesting set of discussions:

  * Should the tokenizer do ASCII case folding when scanning a name  
until it hits a colon (effectively making prefixes ASCII-case- 
insensitive)? Or should each name be scanned without case folding and  
case-folded conditionally later?

>     * The notion of “enclosing element” is problematic in the face  
> of adoption agency algorithms and the like.  The prudent thing to  
> do is to define any case where reparenting would change the meaning  
> of any element to be a (recoverable) error.  This would affect very  
> few users or documents.  It would be a bitch to code in a  
> conformance checker, but that’s not the spec’s writer’s concern.  :-)

Reparenting is already an error. OTOH, making the namespace scoping  
work in that case is more of a concern for implementations other than  
conformance checker. (Conformance checkers have the luxury of being  
able to opt to do fatal errors. I figured that being fatal on  
reparenting is more user-friendly in the conformance checking case  
than reparenting and then doing higher-level checks on reparented  
tree parts and having the errors and the superficial appearance of  
the source not match at all.)

This should probably be addressed in such a way that the namespace  
scope can piggy-back on the stack that the tree builder maintains per  
the current algorithm definition--and then accept whatever counter- 
intuitiveness may arise in error situations.

>      * You might think that this proposal wouldn’t change how text  
> nodes or comments were processed, but there is one case that merits  
> consideration.  The default processing by existing user agents is  
> to render text nodes even when they are enclosed in unknown  
> markup.  In some cases, this may not be desirable.  The XML CDATA[]  
> syntax is treated as a comment by HTML parsers, so this may be used  
> to “cloak” such text regions.  For this to work, however, HTML5  
> compliant parsers would have to treat such constructs as text, but  
> only when enclosed by an extension element.  Again, a more  
> complicated parse state machine is necessary in order to preserve  
> backwards compatibility and extensibility.

I think I don't understand what behavior you are suggesting here.

> Implications
> ------------

  * As an unintended consequence an extension mechanism like this  
could make it possible to declare the HTML namespace with a prefix  
and hack different core element treatment in legacy UAs and extension  
mechanism-aware UAs. (This can be a good thing or a bad thing.)

>     * This proposal does, however, increases the size of the  
> profile of XHTML that can be reasonably handled by HTML5 parsers.   
> I or others could voluntarily chose to restrict ourselves to that  
> profile, but are not compelled to do so.  In my case, a typical  
> page would only increase in size by at most a few dozen bytes to  
> conform.

I still think that you and your site are special cases and what works  
for you may not work for the masses of the ViewSourceClan[1] who  
don't have sufficient spec lawyering proficiency.

>     * To make HTLM5 more robust, it may make sense to define a  
> central registry of default prefixes.  This would likely be  
> controversial, but would effectively address a common problem.   
> Such prefixes would, of course, be overridable in any document; the  
> intent of this is to handle the case where somebody copy/pastes a  
> document fragment without the enclosing namespace declaration.

Makes sense. However, the barrier for registration should be really,  
really low for this to work. (As in much, much lower than for  
registering anything with the IANA.)

[1] http://intertwingly.net/wiki/pie/ViewSourceClan (for the benefit  
of other list members)
-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Friday, 3 August 2007 12:42:59 UTC