Re: Making the HTML language self-describing from Jonas Sicking on 2009-01-08 (public-html@w3.org from January 2009)

From: Jonas Sicking <jonas@sicking.cc>
Date: Wed, 7 Jan 2009 18:41:08 -0800
To: "Ian Hickson" <ian@hixie.ch>
Cc: "Martin Atkins" <mart@degeneration.co.uk>, "Julian Reschke" <julian.reschke@gmx.de>, public-html@w3.org
Message-ID: <63df84f0901071841i5ddb363dp2d2a562ac41cb651@mail.gmail.com>
On Wed, Jan 7, 2009 at 5:12 PM, Ian Hickson <ian@hixie.ch> wrote:
> On Wed, 7 Jan 2009, Martin Atkins wrote:
>>
>> It would be ideal if future versions of HTML would be parsable by todays
>> parsers, even if they ultimately ignore elements they don't understand.
>>
>> The best example of this is void elements that get parsed as non-void by
>> legacy parsers; it is therefore not possible to use new void elements
>> without breaking software that employs legacy parsers, since the entire
>> tree after the new void element will be incorrect.
>
> On Wed, 7 Jan 2009, Jonas Sicking wrote:
>>
>> So, sort of restarting this thread again. Here are the problems that
>> would be good to solve:
>>
>> 1. When a new version of HTML6 comes out, it should be possible to write
>> a document that uses elements from HTML6, but that parses to the same
>> DOM in a browser that both supports HTML6 and HTML5. Ideally such a
>> document would also validate as valid HTML6 and HTML5. Note that this
>> doesn't mean that *every* document should parse to the same DOM, just
>> that it is possible to write one that uses a new element but still
>> produces the same DOM in both parsers. So for example it's IMHO ok to
>> require that </p> elements are closed and that no tags are missnested
>> for the same DOM to be produced.
>
> If you never use optional end tags, the only thing that would cause a DOM
> difference that I can think of is void elements.

Yup, I think that's true.

> However, DOM differences would be the least of your problems if the UA
> doesn't support the void elements. With flow elements like <section> or
> <meter>, you might be able to use the elements even though the UA doesn't
> support them because you can style them. But with void elements, the
> elements are useless if the UA doesn't support them.

That's not true, both <br> and <hr> are void elements but could be
decently implemented using styling in a UA that doesn't support them.
Possibly <spacer> and <wbr> can too.

> In other words, it basically *doesn't matter* if the DOM is different if
> you're using void elements the UA doesn't support.

There are several occations where this is not true:
1. In cases where you can use CSS and/or JS to emulate the element. In
addition to <br>, <hr>, <spacer>, and <wbr>, this is true for
<eventsource> (can be emulated using XHR and JS), <embed> (can be
emulated using <object>), possibly <command> can be emulated using JS
(don't know enough about it), <bgsound> (can be emulated using <audio>
or flash).

2. The site is ok with having different levels of functionality for
different browsers.

3. The site could deploy a custom plugin or extension and use that to
implement the element.

For the case where neither of these are true then the site has no
choice but to not use the element at all until all browsers with
significant market share has deployed support for the element.

> In fact, as far as I can tell, the only problem would be with
> round-tripping, which is a serialisation issue:

There's also the issue with receiving a DOM that is significantly
different due to the parser not knowing that an element is a void
element. And I don't really see how you could solve round-tripping by
changing the serializer since the DOM you receive is "wrong", could
you elaborate?

>> 2. Make it possible to create a generic serializer that takes a DOM and
>> produces HTML that parses into the same DOM. Independent of which HTML
>> version (>= 5) is used to parse.
>
> As far as I can tell, if you have a conforming document and you're willing
> to not omit any of the optional end tags, all you need to have a generic
> serialiser is a list of void elements, elements CDATA elements, RCDATA
> elements, and the list of elements that are affected by the historical
> pre/textarea implied newline processing.
>
> This can be trivially encoded as four lines in a configuration file.

This is an interesting solution indeed.

>> 3. Write a generic parser that can be used to parse HTML markup of any
>> version (>= 5) into a DOM.
>
> I don't think we'll ever be able to do this. For example, there is no way
> I could have predicted how we were going to add <ruby> parsing to the spec
> before I added it. This would be possible if we could guarantee that for
> all time, all new inventions would always be done in a regular way, but
> history has shown that we would be naive to assume this.

I agree.

/ Jonas
Received on Thursday, 8 January 2009 02:41:43 UTC