Re: ISSUE-41 (Dave Orchard): Decentralized extensibility from Henri Sivonen on 2008-05-12 (public-html@w3.org from May 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Mon, 12 May 2008 10:12:57 +0300
To: Sam Ruby <rubys@us.ibm.com>
Cc: HTML Issue Tracking WG <public-html@w3.org>
Message-Id: <A3F7DF58-3361-4754-A637-44F91B40A544@iki.fi>
On May 12, 2008, at 05:06 , Sam Ruby wrote:

> > Conclusion:
> >
> > It seems to me that mixing XHTML5 with product-specific elements and
> > attributes in XML is a good fit for this extension case. No new
> > extension mechanism is needed, since Namespaces in XML are  
> available.
> >
> > Objection: XML is too hard because it's Draconian.
> > Answer: Then we need non-Draconian XML5.
> > Follow-up objection: Want one syntax instead of two (HTML5 and  
> XML5).
>
> Actually, I believe that the answer above is arguing for THREE  
> syntaxes.  XHTML5 is not going away.  Nor is it going to usurp the  
> web.  So we either need Case #3 addressed by text/html, OR by a new  
> syntax that has most of the charateristics of HTML5.  Note: I am  
> *not* arguing for three syntaxes, I am stating that EITHER this  
> needs to be addressed by text/html OR we need three syntaxes.
>
Fair enough.

I will then argue that we shouldn't let case #3 inconvenience  
authoring for browsers, so if we end up with a two-way choice of  
adding a third syntax or making the browser-targeted syntax crufty, I  
think we should add a third syntax instead of making targeting the  
multi-vendor Web platform crufty in order to enable the same syntax  
target specific non-browser products.
> > Answer: Too bad but text/html backwards compatibility and being able
> > to represent all XML 1.0 infosets (needed for replacing XML) are
> > conflicting goals.
> >
> > Proof of conflict:
> >   1) text/html compatibility requires the parser to infer an <html>
> > root.
> >   2) There are XML vocabularies whose root is not <html>, such as  
> SVG.
> >   3) Q.E.D.
>
> I believe that's overstating the point.  OpenID is an example of a  
> protocol for which discovery requires a root <html> element (I kid  
> you not).  Limiting distributed extensibility to documents which  
> start with a <html> element (for example) would not be an ornerous  
> requirement.  Most of the web have such elements, and for the rest,  
> adding such would not be difficult.  Contrast this amount of effort  
> that would be required for a typical page to become (and remain!)  
> well formed XML.
>
This part of my email wasn't well developed and doesn't look good in  
isolation of the discussion I had with Dave. Sorry.

My point is that having a single markup format for both Web content  
(text/html) and for server-to-server system integration (XML) isn't a  
good idea in *practice*, because if you bend the definition of XML far  
enough to actually work for text/html legacy content, it will no  
longer work for XML legacy vocabularies. (On the face of it, it's  
appealing in *theory*, though.)

So we are stuck with (at least) two kinds of parsers and trying to  
push the number of kinds of off-the-shelf parsers to one is misguided.  
We are already getting to the point where the two kinds of parsers can  
provide a common interface to other software components, though.
> > Case #2 -- Extensions that are meant to be used in documents  
> consumed
> > by browsers but the extensions have value even when not acted on by
> > browsers
> >
> > This is the hard case. The would-be extenders are more numerous than
> > sources of browser engines. The would-be extenders can't change the
> > way HTML is parsed.
> >
> > I don't feel as sure about presenting a solution for this case as  
> I am
> > for the other two cases. I will, however, observe that regardless of
> > whether URI-based extensibility is used, a pattern of using  
> attributes
> > is emerging in order not to interfere with element parsing: without
> > URI-based extensibility being microformats and with URI-based
> > extensibility being RDFa.
> >
> > No conclusion, but this does suggest to me that the answer should  
> be:
> >
> > Put this kind of extensions in attributes. (New attributes, magic
> > values or both.)
>
> I would be more convinced if this were to have proven to be  
> sufficient for SVG.  Or MathML.  Or FDML.  Note: I am *not*  
> challenging your assertion that treating SVG and/or MathML as if  
> they were native to HTML is not "horrific".  I am merely stating  
> that while a limitation to attributes may be sufficient for some  
> vocabularies, it is not clear to me that this is true for all  
> vocabularies.
>
Sure, putting stuff in attributes is more crufty than using element  
names for some identifiers. However, as a matter of value judgment, I  
consider case #1 (extending the browser feature set) a more important  
case than this case #2, so they don't need to lead to equally  
convenient syntax. SVG and MathML are instances of Case #1, so we  
don't need to optimize this case for them. (Some will argue--and I  
have in TPAC hall discussions without facing outrage surprisingly-- 
that this case #2 is in many cases a threat to interoperability, so we  
shouldn't seek to optimize for it.)

There is a construction that allows any elements and attributes  
vocabulary to be cast as attributes only (one elements supplied by a  
host language): you make one more attribute that holds what you'd  
normally put in the element name. ARIA does this with 'role'.  
Microformats do this with 'class'. RDFa does this with 'property'.

Sure, this is crufty compared to using an element name, so we should  
avoid this pattern in the core of the Web platform (Case #1), although  
ARIA is using it due to practical legacy concerns. But if you start  
comparing the microformat way with Namespaces in XML, it's not so  
crufty after all: Namespaces in XML require an additional attribute,  
too, and the contents of that attribute are way more crufty. RDFa, of  
course, is doubly crufty, because it has both the cruft of  
microformats and the cruft of Namespaces.

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Monday, 12 May 2008 07:13:36 UTC