Re: ISSUE-41 (Dave Orchard): Decentralized extensibility from Sam Ruby on 2008-05-12 (public-html@w3.org from May 2008)

From: Sam Ruby <rubys@us.ibm.com>
Date: Sun, 11 May 2008 22:06:14 -0400
To: Henri Sivonen <hsivonen@iki.fi>
Cc: HTML Issue Tracking WG <public-html@w3.org>, public-html-request@w3.org
Message-ID: <OF8B33AB02.478CEB3E-ON85257447.000A3B70-85257447.000B8E99@us.ibm.com>
Henri Sivonen wrote on 05/11/2008 01:09:06 PM:
>
> On May 8, 2008, at 18:20 , HTML Issue Tracking Issue Tracker wrote:
>
> > ISSUE-41 (Dave Orchard): Decentralized extensibility
> >
> > http://www.w3.org/html/wg/tracker/issues/
> >
> > Raised by: David Orchard
> > On product:
> >
> > The HTML5 specification does not have a mechanism to allow
> > decentralized parties to create their own languages, typically XML
> > languages, and exchange them in HTML5 text/html serializations.
> > This would allow languages such as SVG, MathML, FBML and a host of
> > others to be included.  At one point, an editors version of the
> > HTML5 specification contained a subset and reformulation of SVG and
> > MathML.  Tim Berners-Lee described this incorporation of SVG and
> > MathML without namespaces as horrific and the issue raiser
> > completely concurs with the him.
> >
> > This issue limits the ability of non-HTML5 working groups to define
> > languages as the languages must be "brought into" the HTML5
> > language.  This dramatically increases the scope of HTML5 and
> > decreases the ability to modularize development of orthogonal
> > languages.
> >
> > In the end, the problem could result in the text/html serialization
> > rules becoming the standard serialization rules for XML languages,
> > replacing XML itself.  This could occur if every decentralized
> > language has a choice between the XML serialization, the text/html
> > serialization or both.  In many cases, the language may choose the
> > text/html serialization.
>
>
> I discussed this with Dave in Dublin this past week. Here's a write-up
> of my points (with elaborations) for WG members who weren't attending
> the same dinner:
>
> I think there are three kinds of HTML extensions:
>   1) Extensions to the feature set of browser engines. Example: <canvas>
>   2) Extensions that are meant to be used in documents consumed by
> browsers but the extensions have value even when not acted on by
> browsers: Example: hCal.
>   3) Extensions that are not meant to be used in documents consumed by
> browsers. Example: FBML.
>
> I think case #1 and #3 are easier to tackle than #2, so I discuss #1
> and #3 first.
>
> Case #1 -- Extensions to the feature set of browser engines
>
> Extensions to the feature set of browser engines quite obviously
> require modifications to the engines. Since modifications to the
> engines are needed anyway, modifications to the parser could be made
> as well. Moreover,to implement <canvas>, SVG, MathML, Web Forms 2.0 or
> ARIA, the amount of work needed on code above the parser layer by far
> exceeds the amount of work that would go into tweaking the parser if
> the extension mechanism for this class of extensions weren't generic
> enough not to require parser changes. Therefore, I think trying to
> establish a framework that guaranteed that we'd never have to tweak
> parsing would be the wrong optimization. However, when an extension
> can be done without tweaking the parser, not tweaking the parser is
> preferable (hence, aria-foo).
>
> Even if parsing is tweaked, it is crucial not to tweak parsing in a
> way that interfered with existing Web content too much. This
> dramatically limits what kind of syntax extensions can use.
>
> Extensions to the feature set of browser engines, if successful, stop
> being "extensions" when considered at a future date. From a future
> point of view, the features have become part of the core feature set.
> Also, the time from being considered an "extension" to being
> considered "in the core" is likely shorter than the time from thereon
> until the sunset of that part of Web technology. Once a feature has
> become part of the core, Web authors using the feature shouldn't have
> to care where it came from. Moreover, they shouldn't have to deal with
> cruft like namespace URIs or prefixes. I think having <canvas> is much
> better than having <apple:canvas>.
>
> As a case study, <canvas> isn't flawless. So what went wrong? At
> first, Apple made it a void element but then others wanted to make it
> a container. Apple then had to make an incompatible change to their
> implementation to align with others. This could have been avoided by
> allowing the design to be reviewed by interested parties before
> shipping a product with the feature.
>
> It should also be noted that the number of parties who effectively
> have the power to extend the browser engine feature set is fairly
> small. (I'd say 4 at the moment.) A small number of parties can take
> names from a single pool on a first-come-first-served basis. This
> seems repulsive to some, as it raises the question of name squatting.
> However, in practice to claim a name on the Web is to successfully
> implement uses for a name. URI-based extensibility isn't immune to
> this phenomenon: If browsers implement certain behavior for a given
> namespace, but a WG who claims the namespace in a de jure way writes a
> contradictory spec, the spec is pretty toothless in practice.
>
> Conclusion:
> This extension case doesn't need a technical mechanism. It needs a
> social mechanism that allows would-be extenders to bring their intent
> to extend forward for community review early in the process. (The
> mechanism can be posting to public-html or to the WHATWG mailing list.)
>
> (I don't find adding MathML and SVG parsing to text/html as special
> cases horrific.)
>
> Case #3 -- Extensions that are not meant to be used in documents
> consumed by browsers
>
> When documents aren't meant for browsers, we can get rid of a lot of
> backwards-compatibility baggage. On the other hand, this kind of
> extensions are likely to be more abundant than extensions to browser
> engines, because product-specific templating systems outnumber
> browsers. Furthermore, templating systems may want to be able to put
> individual HTML elements in contexts that are incompatible with text/
> html parsing. Consider for example templating syntax between <table>
> and its <tr>s.
>
> Since these cases are likely to outnumber browsers and to be more
> frequent than extensions to browser engines, being able to use off-the-
> shelf software without modification becomes a more attractive
> optimization point while at the same time the parser doesn't need to
> be Web-compatible.
>
> Conclusion:
>
> It seems to me that mixing XHTML5 with product-specific elements and
> attributes in XML is a good fit for this extension case. No new
> extension mechanism is needed, since Namespaces in XML are available.
>
> Objection: XML is too hard because it's Draconian.
> Answer: Then we need non-Draconian XML5.
> Follow-up objection: Want one syntax instead of two (HTML5 and XML5).

Actually, I believe that the answer above is arguing for THREE syntaxes.
XHTML5 is not going away.  Nor is it going to usurp the web.  So we either
need Case #3 addressed by text/html, OR by a new syntax that has most of
the charateristics of HTML5.  Note: I am *not* arguing for three syntaxes,
I am stating that EITHER this needs to be addressed by text/html OR we need
three syntaxes.

> Answer: Too bad but text/html backwards compatibility and being able
> to represent all XML 1.0 infosets (needed for replacing XML) are
> conflicting goals.
>
> Proof of conflict:
>   1) text/html compatibility requires the parser to infer an <html>
> root.
>   2) There are XML vocabularies whose root is not <html>, such as SVG.
>   3) Q.E.D.

I believe that's overstating the point.  OpenID is an example of a protocol
for which discovery requires a root <html> element (I kid you not).
Limiting distributed extensibility to documents which start with a <html>
element (for example) would not be an ornerous requirement.  Most of the
web have such elements, and for the rest, adding such would not be
difficult.  Contrast this amount of effort that would be required for a
typical page to become (and remain!) well formed XML.

> Case #2 -- Extensions that are meant to be used in documents consumed
> by browsers but the extensions have value even when not acted on by
> browsers
>
> This is the hard case. The would-be extenders are more numerous than
> sources of browser engines. The would-be extenders can't change the
> way HTML is parsed.
>
> I don't feel as sure about presenting a solution for this case as I am
> for the other two cases. I will, however, observe that regardless of
> whether URI-based extensibility is used, a pattern of using attributes
> is emerging in order not to interfere with element parsing: without
> URI-based extensibility being microformats and with URI-based
> extensibility being RDFa.
>
> No conclusion, but this does suggest to me that the answer should be:
>
> Put this kind of extensions in attributes. (New attributes, magic
> values or both.)

I would be more convinced if this were to have proven to be sufficient for
SVG.  Or MathML.  Or FDML.  Note: I am *not* challenging your assertion
that treating SVG and/or MathML as if they were native to HTML is not
"horrific".  I am merely stating that while a limitation to attributes may
be sufficient for some vocabularies, it is not clear to me that this is
true for all vocabularies.

> I'm not convinced that URI-based extensibility or similar is needed.
> To me it seems that it's sufficient for each extension to be have a
> gestalt that is recognizable with adequate probability. It seems
> unlikely that a valid hCal entry occurred by random chance. (rel='me'
> occurring by random chance seems a bit more likely, but I don't really
> worry about it.)
>
> --
> Henri Sivonen
> hsivonen@iki.fi
> http://hsivonen.iki.fi/

- Sam Ruby
Received on Monday, 12 May 2008 02:31:06 UTC