Re: The non-polyglot elephant in the room

On Mon, Jan 21, 2013 at 11:03 PM, Henri Sivonen <hsivonen@iki.fi> wrote:
> On Mon, Jan 21, 2013 at 11:15 PM, David Sheets <kosmo.zb@gmail.com> wrote:
>> I was under the impression that an explicit goal of standardizing the
>> HTML5 parser was so that HTML consumers and producers could rely on a
>> de jure interpretation of nonsensical markup.
>
> Consumers, yes. Logically, it will eventually mean that producers can,
> too, but we’ve tried to avoid advertising that.

Eventually? Programmers already know and may write their software with
that in mind.

>> While many consider
>> XML's restrictions nonsensical, it is prima facie absurd that
>> champions of HTML5's apologetic parser refuse to consider the subset
>> of HTML5 that is also valid XHTML5 as clearly important to a
>> population of authors.
>
> I don’t think it’s absurd for HTML parser champions to be opposed to
> polyglot, since championing HTML parsing involves asking people to
> possess an HTML parser (everyone already has an XML parser).

HTML parser champions cite liberal interpretation of a primary benefit
of their approach.

A subset of HTML serializations happens to also be well-formed XML due
to this approach.

How are nonsensical HTML serializations acceptable but strict rules
for dual HTML/XML serializations offensive?

> If you
> possess an HTML parser and an XML parser, you don’t need polyglot. If
> you  get text/html, you use the HTML parser. If you get
> application/xhtml+xml, you use the XML parser.

If you only possess a single parser and someone hands an (X)HTML
polyglot doc to you, you are guaranteed to be able to parse it
reliably.

With polyglot, only a single bytestream needs publication for
compatible parsing.

With polyglot, little to no server-side participation is required to
publish both HTML and XML.

>> >From my perspective, anti-polyglot proponents advocate global
>> text/html interpretation of nearly everything *except* XHTML.
>
> I’m advocating for text/html interpretation of text/html *only*. I am
> advocating against text/html interpretation of application/xhtml+xml
> and vice versa.

There exist bytestreams that have identical meaning in both text/html
and application/xhtml+xml. I don't think anyone is suggesting that
text/html and application/xhtml+xml should completely unify their
semantics beyond what is already clearly aligned and easily
interpretable.

>> XHTML is
>> stricter than HTML and polyglot serializations *should* exist for any
>> DOM
>
> Impossible in the general case without breaking backwards
> compatibility of either HTML or XML. I.e. not worth the trouble.

"General case" is not applicable when we are discussing a subset of
the semantics of HTML and a subset of the semantics of XHTML. Could
you give some examples of DOMs that are expressible in HTML but not
XML and how accommodation of an XML serialization of those DOMs would
necessitate breaking backward compat for HTML?

>> I am genuinely confused by arguments which appear to encourage liberal
>> emission and deride conservative emission.
>
> Do you consider “valid HTML for text/html and valid XHTML for
> application/xhtml+xml” “liberal”?

Is "Hello, <p><div>world</p>!" a valid HTML document? Why is this
document more reasonable to grant definitive semantic meaning than a
document that happens to satisfy both text/html and
application/xhtml+xml?

A serialization that is valid for both text/html and application/xml
is quite conservative. Assigning most any HTML-ish document a
definitive DOM and associated semantics is quite liberal.

HTML5 > valid HTML5 > polyglot

Regards,

David

P.S. Your mail client appears to be destroying conversation metadata
(To, Cc) which makes it harder to have a conversation now and also
understand it in the future. </irony>

Received on Tuesday, 22 January 2013 23:06:27 UTC