Re: Are new void elements really a good idea? from Julian Reschke on 2008-08-31 (public-html@w3.org from August 2008)

From: Julian Reschke <julian.reschke@gmx.de>
Date: Sun, 31 Aug 2008 18:38:33 +0200
To: Henri Sivonen <hsivonen@iki.fi>
CC: HTML WG <public-html@w3.org>
Message-ID: <48BAC909.1040108@gmx.de>
Henri Sivonen wrote:
> 
> On Aug 30, 2008, at 21:24, Julian Reschke wrote:
> 
>> I'll try to explain the problem first from the point of view of a 
>> someone trying to create HTML *programmatically*. When I say 
>> *programmatically*, I don't mean printf-style code, just concatenating 
>> lines of output, but generic code can that can serialize some kind of 
>> object representation of the document, such as (gasp) DOM, or code 
>> that is event-driven as a SAX output handler.
> 
> One of the benefits of such a setup is that all the serialization code 
> is in one place--typically in one source file / class. The 
> maintainability argument motivating this architecture becomes moot if 
> one refuses to maintain the serializer.

Only sort of.

First of all, it's not like this code needs to be maintained *once*, it 
needs maintenance every time the set of void elements change. I would 
call this a stupid design of the target language.

>> This is the approach used in XSLT 1.0, and also in common Java 
>> serializers that just use the XSLT's modules. I haven't checked, but I 
>> wouldn't be surprised if other libraries worked the same way.
> 
> Do you have pointers to specific libraries (besides Saxon that you 
> already mentioned and, obviously, Xalan) that we should consider?

Well, any implementation of the Java XML stack. So whatever is in the 
JDKs (different in 1.4 and 1.5, btw), for instance. And independant 
implementations: for instance SAP happens to maintain it's own XML stack 
(don't ask me why), or used to. Also Oracle has it's own XML stack. So 
it's not like it's only one single piece of code that needs to be upgraded.

I would expect the situation is the same in Microsoft land, with at 
least two independent XSL implementations, both in use.

>> This means that anytime a new void element is introduced into HTML, 
>> all these implementations need to be updated. Not good.
> 
> That depends on what the cost of updating the serializer is relative to 
> making other changes required for using new HTML language features in a 
> given system. When the serializer isn't buried under TrAX, I would 
> expect this cost to be relatively low: amending a list of void elements 
> in one source file / compilation unit.

I didn't say that it's *hard* to do each of the upgrades. But it needs 
to be done, the code needs to be tested, released, deployed. In some 
case, the code is not open-source.

So, how about going to Sun and asking them for doing this kind of 
maintenance for the currently shipping JDKs?

> In the case you mention, Saxon, the fix would be trivial in software if 
> HTML5 output mode were replacing the HTML 4 output mode: It would be one 
> additional line per new void element in one static initializer (in 
> net.sf.saxon.event.HTMLEmitter). The problem is not patching the 
> software here. In the case of Saxon, the problem is patching it *and* 
> keeping it compliant to a spec that is less easy to change than software 
> (the XSLT spec) or offering it as yet another option alongside the XSLT 
> standard "html" output method.

Exactly.

So the alternative would be to get the W3C to release errata for XSLT 
1.0 and 2.0, adding to the list of void elements. (Dan C., how does that 
sound from the POV of the staff contact???)

> Then there's the issue that patching libraries can be too costly even if 
> source is in theory available. For example, normal developers wouldn't 
> patch the Xalan serializer as shipped by Sun as part of the default TrAX 
> implementation. In the cases you named (DOM and SAX in Java and Saxon if 
> we assume it's used through TrAX) liberally licensed code is already 
> available for replacing legacy serializers: The Validator.nu HTML Parser 
> comes with a SAX serializer that outputs UTF-16 to a Writer or UTF-8 to 
> an OutputStream (nu.validator.htmlparser.sax.HtmlSerializer). This class 
> can be used with Saxon or Xalan when wrapped in 
> javax.xml.transform.sax.SAXResult. For the DOM, the Validator.nu HTML 
> Parser comes with nu.validator.htmlparser.dom.Dom2Sax which can (among 
> other things) be used for wrapping 
> nu.validator.htmlparser.sax.HtmlSerializer for use with DOM trees.

Yes.

> Julian does have a legitimate point, though. The question is, what do we 
> value more: the language being more elegant or HTML serializer 
> developers having to do a little patching and their user having to 
> update to a new version.

I came to this issue because of XSLT 1.0. But as a matter of fact, it - 
by definition - applies to all pieces of code that today serialize to 
HTML4, and furthermore is also applies to every future change of the set 
of void elements.

> ...
>> The inability to control how an element is serialised seems like a
>> limitation in XSLT that should probably be fixed in XSLT, rather than
>> maintaining that it should place constraints on HTML5's syntax.
> 
> For XML technologies, such control would be micromanagement of 
> semantically meaningless syntactic sugar, so none of XSLT, SAX and DOM 
> provide it. I think this is not a bug in XSLT, SAX and the DOM.
> 
> In the context of HTML, for a given standard level of HTML, there's 
> exactly one configuration that isn't bogus. It's a feature that the 
> serializer relieves the application programmer from managing this.
> 
> The problem here is that the serializer needs an update when HTML gains 
> more void elements. I'm inclined to think that the problem isn't too 
> severe.

Again, I'd argue that we should compare the cost we're causing for 
updates with the benefit of having new void elements.

> On Aug 31, 2008, at 14:22, Julian Reschke wrote:
> 
>> On the other hand, what, except ideological reasons, stops us from 
>> allowing
>>
>> <tagname></tagname>
>>
>> as well?
> 
> 
> Allowing that as a syntactic *alternative* is bad language design, since 
> it misleads language users to think that you can write 
> <tagname>foo</tagname> and have "foo" appear as a child of the tagname 
> element. (I guess YMMV if this counts as an ideological reason.)

Actually, I would expect exactly that, but that's because I live in XML 
land most of the time.

> On Aug 31, 2008, at 15:18, Julian Reschke wrote:
> 
>> But that advantage needs to be weighed against the cost of breaking 
>> existing libraries, and the ability to evolve the language without 
>> having to rewrite code all over.
> 
> I agree.
> 
> However, I'm inclined to think that when someone writes <video> element 
> emission support for a CMS for example, throwing in a new off-the-shelf 
> serializer or adding a few entries to the void element list of an 
> existing serializer is not a big deal compared to the other 
> implementation work. (Unless, of course, the app writer wants to do 
> wrong things like use a non-UTF-8 encoding for output, which a new lean 
> serializer might not support. :-)

Keep in mind that not all developers have the freedom just to pick a new 
off-the-shelf component, be it commercial or open source. For instance, 
in certain companies you either stick with what the shipping J2EE 
includes, or you need to write your own serializer (speaking from 
experience here).

BR, Julian
Received on Sunday, 31 August 2008 16:39:21 UTC