Re: Distributed Extensibility from Sam Ruby on 2007-08-03 (public-html@w3.org from August 2007)

From: Sam Ruby <rubys@us.ibm.com>
Date: Fri, 03 Aug 2007 12:59:51 -0400
To: Henri Sivonen <hsivonen@iki.fi>
CC: public-html@w3.org
Message-ID: <46B35F07.6050603@us.ibm.com>
Henri Sivonen wrote:
> On Aug 2, 2007, at 18:16, Sam Ruby wrote:
> 
>> Since the workgroup demands use cases for any proposed new feature, I 
>> will provide one up front: this feature’s use case is to enable 
>> features without use cases.
> ...
>> FBML isn’t intended to be directly processed by browsers, but that 
>> shouldn’t preclude it from being processed by other HTML5 tools, 
>> everything from sanitizers to conformance checkers to pretty printers, 
>> to search engines.
> 
> Is it the assumption that HTML5 so extended would be served on the 
> public network in ways that would routinely expose the extension markup 
> to browsers? If the extensions are intended to be processed by 
> non-browser tools in the context of a walled garden such as Facebook, 
> wouldn't XHTML5 plus namespaced extensions work?

Perhaps, for the six of us or so that seem capable of consistently 
producing well formed XML.  But what about Francis here:

https://secure.mysociety.org/cvstrac/fileview?f=mysociety/pb/phplib/pbfacebook.php&v=1.1

Note: I don't care for the use of pejorative terms like walled gardens 
here.  I will readily concede that the term is accurate, but it is a 
distraction.  People should be "freely extensible by anybody" (I lifted 
those words straight from Atom's Roadmap).

While we should encourage extensions, we should recognize that users 
will screw things up.  While I can't conceive of anything that would 
validate the PHP script I referenced above, I can imagine conformance 
checkers that validated the output.  Conformance checkers that not only 
validate HTML structure, but also validate that a/@href attributes are 
URIs (despite being defined in a separate document) and that fb:default 
elements have fb:switch elements as their parents (again, despite being 
defined in a separate document).

>> XML permits an alternate syntax, namely default namespaces.  In 
>> certain circles, such a syntax is very popular.  Regrettably, allowing 
>> such a syntax would pose problems for back level user agents, and 
>> therefore must be disallowed in the HTML5 “custom format”.
> 
> However, such an approach might well work for bringing specific 
> well-known XML vocabularies with distinct subtrees to the text/html 
> serialization, specifically SVG and MathML with namespace mapping scope 
> established by <svg> and <math> as subtree roots. When it comes to 
> extending text/html to be an alternative infoset serialization for a 
> broader range of possible infosets, I'd prefer to optimize for enabling 
> those two well-known namespaces instead of optimizing for private 
> extensions. (Not to suggest that the two goals were mutually 
> exclusive--just suggesting that well-known vocabularies are preferable 
> oven private vocabularies in a non-walled garden.)

We should design for everybody, and verify for those that we care about.

As somebody who authors SVG in VIM, I'm comfortable with what I 
proposed.  I will also note that these requirements (at least for SVG) 
are consistent with the following profile:

http://www.w3.org/TR/2002/WD-XHTMLplusMathMLplusSVG-20020809/

As to MathML, we still need to decide whether or not to grandfather in 
that vocabulary.

>> Messy details
>> -------------
>>
>> I don’t pretend that these are exhaustive, but they should seed an 
>> interesting set of discussions:
> 
>  * Should the tokenizer do ASCII case folding when scanning a name until 
> it hits a colon (effectively making prefixes ASCII-case-insensitive)? Or 
> should each name be scanned without case folding and case-folded 
> conditionally later?

Good catch.  I'm going to add it to my original source page on my weblog.

>>      * You might think that this proposal wouldn’t change how text 
>> nodes or comments were processed, but there is one case that merits 
>> consideration.  The default processing by existing user agents is to 
>> render text nodes even when they are enclosed in unknown markup.  In 
>> some cases, this may not be desirable.  The XML CDATA[] syntax is 
>> treated as a comment by HTML parsers, so this may be used to “cloak” 
>> such text regions.  For this to work, however, HTML5 compliant parsers 
>> would have to treat such constructs as text, but only when enclosed by 
>> an extension element.  Again, a more complicated parse state machine 
>> is necessary in order to preserve backwards compatibility and 
>> extensibility.
> 
> I think I don't understand what behavior you are suggesting here.

Time for a concrete example:

http://intertwingly.net/svg/410.svg

If I don't use CDATA, the "410" shows up in IE.  If I do use CDATA, 
nothing shows up in IE.

We need a way to say "if you can't handle this extension, don't show 
anything (or perhaps, show a fallback defined separately)".

Now, if we allow this use of CDATA, we need it to actually show up as a 
text node in the DOM as opposed to a comment node when used in precisely 
this circumstance.

>> Implications
>> ------------
> 
>  * As an unintended consequence an extension mechanism like this could 
> make it possible to declare the HTML namespace with a prefix and hack 
> different core element treatment in legacy UAs and extension 
> mechanism-aware UAs. (This can be a good thing or a bad thing.)

Let's simply declare that to be an error and be done with it.  I'm also 
going to update my page with this one.

- Sam Ruby
Received on Friday, 3 August 2007 17:00:06 UTC