- From: Henri Sivonen <hsivonen@iki.fi>
- Date: Sun, 31 Aug 2008 18:59:03 +0300
- To: HTML WG <public-html@w3.org>
On Aug 30, 2008, at 21:24, Julian Reschke wrote: > I'll try to explain the problem first from the point of view of a > someone trying to create HTML *programmatically*. When I say > *programmatically*, I don't mean printf-style code, just > concatenating lines of output, but generic code can that can > serialize some kind of object representation of the document, such > as (gasp) DOM, or code that is event-driven as a SAX output handler. One of the benefits of such a setup is that all the serialization code is in one place--typically in one source file / class. The maintainability argument motivating this architecture becomes moot if one refuses to maintain the serializer. > This is the approach used in XSLT 1.0, and also in common Java > serializers that just use the XSLT's modules. I haven't checked, but > I wouldn't be surprised if other libraries worked the same way. Do you have pointers to specific libraries (besides Saxon that you already mentioned and, obviously, Xalan) that we should consider? > This means that anytime a new void element is introduced into HTML, > all these implementations need to be updated. Not good. That depends on what the cost of updating the serializer is relative to making other changes required for using new HTML language features in a given system. When the serializer isn't buried under TrAX, I would expect this cost to be relatively low: amending a list of void elements in one source file / compilation unit. In the case you mention, Saxon, the fix would be trivial in software if HTML5 output mode were replacing the HTML 4 output mode: It would be one additional line per new void element in one static initializer (in net.sf.saxon.event.HTMLEmitter). The problem is not patching the software here. In the case of Saxon, the problem is patching it *and* keeping it compliant to a spec that is less easy to change than software (the XSLT spec) or offering it as yet another option alongside the XSLT standard "html" output method. Then there's the issue that patching libraries can be too costly even if source is in theory available. For example, normal developers wouldn't patch the Xalan serializer as shipped by Sun as part of the default TrAX implementation. In the cases you named (DOM and SAX in Java and Saxon if we assume it's used through TrAX) liberally licensed code is already available for replacing legacy serializers: The Validator.nu HTML Parser comes with a SAX serializer that outputs UTF-16 to a Writer or UTF-8 to an OutputStream (nu.validator.htmlparser.sax.HtmlSerializer). This class can be used with Saxon or Xalan when wrapped in javax.xml.transform.sax.SAXResult. For the DOM, the Validator.nu HTML Parser comes with nu.validator.htmlparser.dom.Dom2Sax which can (among other things) be used for wrapping nu.validator.htmlparser.sax.HtmlSerializer for use with DOM trees. On Aug 30, 2008, at 23:01, Lachlan Hunt wrote: > Authoring tools shouldn't be considered to be immutable. Julian does have a legitimate point, though. The question is, what do we value more: the language being more elegant or HTML serializer developers having to do a little patching and their user having to update to a new version. > <eventsource src="foo"/> is allowed. Isn't that sufficient? No, since the whole point is that the serializer needs to know which elements are void elements. (You wouldn't want an HTML serializer to turn a script element with no children into <script/>.) On Aug 31, 2008, at 12:37, Lachlan Hunt wrote: > Provide a way in the authoring tool, XSLT in this case, for authors to > declare which elements should be serialised as void elements. That would be excessive, since the set of HTML void elements expands very seldom. (The innerHTML getter doesn't let you specify what elements are void, either.) > The inability to control how an element is serialised seems like a > limitation in XSLT that should probably be fixed in XSLT, rather than > maintaining that it should place constraints on HTML5's syntax. For XML technologies, such control would be micromanagement of semantically meaningless syntactic sugar, so none of XSLT, SAX and DOM provide it. I think this is not a bug in XSLT, SAX and the DOM. In the context of HTML, for a given standard level of HTML, there's exactly one configuration that isn't bogus. It's a feature that the serializer relieves the application programmer from managing this. The problem here is that the serializer needs an update when HTML gains more void elements. I'm inclined to think that the problem isn't too severe. On Aug 31, 2008, at 14:22, Julian Reschke wrote: > On the other hand, what, except ideological reasons, stops us from > allowing > > <tagname></tagname> > > as well? Allowing that as a syntactic *alternative* is bad language design, since it misleads language users to think that you can write <tagname>foo</tagname> and have "foo" appear as a child of the tagname element. (I guess YMMV if this counts as an ideological reason.) On Aug 31, 2008, at 15:18, Julian Reschke wrote: > But that advantage needs to be weighed against the cost of breaking > existing libraries, and the ability to evolve the language without > having to rewrite code all over. I agree. However, I'm inclined to think that when someone writes <video> element emission support for a CMS for example, throwing in a new off- the-shelf serializer or adding a few entries to the void element list of an existing serializer is not a big deal compared to the other implementation work. (Unless, of course, the app writer wants to do wrong things like use a non-UTF-8 encoding for output, which a new lean serializer might not support. :-) -- Henri Sivonen hsivonen@iki.fi http://hsivonen.iki.fi/
Received on Sunday, 31 August 2008 15:59:45 UTC