- From: Ian Hickson <ian@hixie.ch>
- Date: Thu, 8 Jan 2009 01:12:49 +0000 (UTC)
- To: Martin Atkins <mart@degeneration.co.uk>, Jonas Sicking <jonas@sicking.cc>, Julian Reschke <julian.reschke@gmx.de>
- Cc: public-html@w3.org
On Wed, 7 Jan 2009, Martin Atkins wrote: > > It would be ideal if future versions of HTML would be parsable by todays > parsers, even if they ultimately ignore elements they don't understand. > > The best example of this is void elements that get parsed as non-void by > legacy parsers; it is therefore not possible to use new void elements > without breaking software that employs legacy parsers, since the entire > tree after the new void element will be incorrect. On Wed, 7 Jan 2009, Jonas Sicking wrote: > > So, sort of restarting this thread again. Here are the problems that > would be good to solve: > > 1. When a new version of HTML6 comes out, it should be possible to write > a document that uses elements from HTML6, but that parses to the same > DOM in a browser that both supports HTML6 and HTML5. Ideally such a > document would also validate as valid HTML6 and HTML5. Note that this > doesn't mean that *every* document should parse to the same DOM, just > that it is possible to write one that uses a new element but still > produces the same DOM in both parsers. So for example it's IMHO ok to > require that </p> elements are closed and that no tags are missnested > for the same DOM to be produced. If you never use optional end tags, the only thing that would cause a DOM difference that I can think of is void elements. However, DOM differences would be the least of your problems if the UA doesn't support the void elements. With flow elements like <section> or <meter>, you might be able to use the elements even though the UA doesn't support them because you can style them. But with void elements, the elements are useless if the UA doesn't support them. In other words, it basically *doesn't matter* if the DOM is different if you're using void elements the UA doesn't support. In fact, as far as I can tell, the only problem would be with round-tripping, which is a serialisation issue: > 2. Make it possible to create a generic serializer that takes a DOM and > produces HTML that parses into the same DOM. Independent of which HTML > version (>= 5) is used to parse. As far as I can tell, if you have a conforming document and you're willing to not omit any of the optional end tags, all you need to have a generic serialiser is a list of void elements, elements CDATA elements, RCDATA elements, and the list of elements that are affected by the historical pre/textarea implied newline processing. This can be trivially encoded as four lines in a configuration file. > 3. Write a generic parser that can be used to parse HTML markup of any > version (>= 5) into a DOM. I don't think we'll ever be able to do this. For example, there is no way I could have predicted how we were going to add <ruby> parsing to the spec before I added it. This would be possible if we could guarantee that for all time, all new inventions would always be done in a regular way, but history has shown that we would be naive to assume this. > 1 seems very important to me to allow for adoption of new elements. I'd > hate it if people were forced to use document.write hacks along with > browser detection to be able to use new elements. You can use new elements other than void elements easily; void elements are only useful once the UAs support the feature anyway. > 2 [and 3] seems important to allow generic tools, such as XSLT or DOM to > produce [and consume] HTML. I would strongly encourage people who are using such pipelines to use XML, and just stick an XML-to-HTML converter on the end of their pipelines (and an HTML-to-XML convertor on the front of their pipelines). These tools already exist, and they can be updated when HTML is updated. -- Ian Hickson U+1047E )\._.,--....,'``. fL http://ln.hixie.ch/ U+263A /, _.. \ _\ ;`._ ,. Things that are impossible just take longer. `._.-(,_..'--(,_..'`-.;.'
Received on Thursday, 8 January 2009 01:13:28 UTC