Re: Are new void elements really a good idea? from Henri Sivonen on 2008-08-31 (public-html@w3.org from August 2008)

From: Henri Sivonen <hsivonen@iki.fi>
Date: Sun, 31 Aug 2008 18:59:03 +0300
To: HTML WG <public-html@w3.org>
Message-Id: <D8D8B981-63AB-497F-8B83-6D73E297BF45@iki.fi>
On Aug 30, 2008, at 21:24, Julian Reschke wrote:

> I'll try to explain the problem first from the point of view of a  
> someone trying to create HTML *programmatically*. When I say  
> *programmatically*, I don't mean printf-style code, just  
> concatenating lines of output, but generic code can that can  
> serialize some kind of object representation of the document, such  
> as (gasp) DOM, or code that is event-driven as a SAX output handler.

One of the benefits of such a setup is that all the serialization code  
is in one place--typically in one source file / class. The  
maintainability argument motivating this architecture becomes moot if  
one refuses to maintain the serializer.

> This is the approach used in XSLT 1.0, and also in common Java  
> serializers that just use the XSLT's modules. I haven't checked, but  
> I wouldn't be surprised if other libraries worked the same way.

Do you have pointers to specific libraries (besides Saxon that you  
already mentioned and, obviously, Xalan) that we should consider?

> This means that anytime a new void element is introduced into HTML,  
> all these implementations need to be updated. Not good.

That depends on what the cost of updating the serializer is relative  
to making other changes required for using new HTML language features  
in a given system. When the serializer isn't buried under TrAX, I  
would expect this cost to be relatively low: amending a list of void  
elements in one source file / compilation unit.

In the case you mention, Saxon, the fix would be trivial in software  
if HTML5 output mode were replacing the HTML 4 output mode: It would  
be one additional line per new void element in one static initializer  
(in net.sf.saxon.event.HTMLEmitter). The problem is not patching the  
software here. In the case of Saxon, the problem is patching it *and*  
keeping it compliant to a spec that is less easy to change than  
software (the XSLT spec) or offering it as yet another option  
alongside the XSLT standard "html" output method.

Then there's the issue that patching libraries can be too costly even  
if source is in theory available. For example, normal developers  
wouldn't patch the Xalan serializer as shipped by Sun as part of the  
default TrAX implementation. In the cases you named (DOM and SAX in  
Java and Saxon if we assume it's used through TrAX) liberally licensed  
code is already available for replacing legacy serializers: The  
Validator.nu HTML Parser comes with a SAX serializer that outputs  
UTF-16 to a Writer or UTF-8 to an OutputStream  
(nu.validator.htmlparser.sax.HtmlSerializer). This class can be used  
with Saxon or Xalan when wrapped in javax.xml.transform.sax.SAXResult.  
For the DOM, the Validator.nu HTML Parser comes with  
nu.validator.htmlparser.dom.Dom2Sax which can (among other things) be  
used for wrapping nu.validator.htmlparser.sax.HtmlSerializer for use  
with DOM trees.

On Aug 30, 2008, at 23:01, Lachlan Hunt wrote:

> Authoring tools shouldn't be considered to be immutable.

Julian does have a legitimate point, though. The question is, what do  
we value more: the language being more elegant or HTML serializer  
developers having to do a little patching and their user having to  
update to a new version.

> <eventsource src="foo"/> is allowed.  Isn't that sufficient?

No, since the whole point is that the serializer needs to know which  
elements are void elements. (You wouldn't want an HTML serializer to  
turn a script element with no children into <script/>.)

On Aug 31, 2008, at 12:37, Lachlan Hunt wrote:
> Provide a way in the authoring tool, XSLT in this case, for authors to
> declare which elements should be serialised as void elements.

That would be excessive, since the set of HTML void elements expands  
very seldom. (The innerHTML getter doesn't let you specify what  
elements are void, either.)

> The inability to control how an element is serialised seems like a
> limitation in XSLT that should probably be fixed in XSLT, rather than
> maintaining that it should place constraints on HTML5's syntax.

For XML technologies, such control would be micromanagement of  
semantically meaningless syntactic sugar, so none of XSLT, SAX and DOM  
provide it. I think this is not a bug in XSLT, SAX and the DOM.

In the context of HTML, for a given standard level of HTML, there's  
exactly one configuration that isn't bogus. It's a feature that the  
serializer relieves the application programmer from managing this.

The problem here is that the serializer needs an update when HTML  
gains more void elements. I'm inclined to think that the problem isn't  
too severe.

On Aug 31, 2008, at 14:22, Julian Reschke wrote:

> On the other hand, what, except ideological reasons, stops us from  
> allowing
>
> <tagname></tagname>
>
> as well?


Allowing that as a syntactic *alternative* is bad language design,  
since it misleads language users to think that you can write  
<tagname>foo</tagname> and have "foo" appear as a child of the tagname  
element. (I guess YMMV if this counts as an ideological reason.)

On Aug 31, 2008, at 15:18, Julian Reschke wrote:

> But that advantage needs to be weighed against the cost of breaking  
> existing libraries, and the ability to evolve the language without  
> having to rewrite code all over.

I agree.

However, I'm inclined to think that when someone writes <video>  
element emission support for a CMS for example, throwing in a new off- 
the-shelf serializer or adding a few entries to the void element list  
of an existing serializer is not a big deal compared to the other  
implementation work. (Unless, of course, the app writer wants to do  
wrong things like use a non-UTF-8 encoding for output, which a new  
lean serializer might not support. :-)

-- 
Henri Sivonen
hsivonen@iki.fi
http://hsivonen.iki.fi/
Received on Sunday, 31 August 2008 15:59:45 UTC