Re: i18n Polyglot Markup/NCRs (7th issue) from Leif Halvard Silli on 2010-07-19 (public-html@w3.org from July 2010)

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Mon, 19 Jul 2010 13:18:50 +0300
To: Henri Sivonen <hsivonen@iki.fi>
Cc: public-html <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-ID: <20100719131850252915.98c30af0@xn--mlform-iua.no>

Henri Sivonen, Mon, 19 Jul 2010 01:28:37 -0700 (PDT):
>>>> 	You may also want to consult bug 9300 [2]. It shows that if we
>>>> 	want to
>>>> create a maximum compatibility specification, then decimal NCRs are
>>>> sometimes more IE compatible than hexadecimal ones are.
> 
> What does any version of IE have to do with determining if a given 
> document is polyglot (X)HTML5?

First of all, my comment was to Richard, who suggested that POlyglot 
markup should "favor" hexadecimal NCRs.

A possible answer to your question is found in Sam's messages [1][2]. 
He suggest only to allow UTF-8 as encoding of polyglot markup.

> If the goal is to recount what documents are polyglot, inferences 
> should be made from specs--not from IE behavior (or the behavior of 
> any particular piece of software).

Of course, inferences should be made from specs - the XML spec and 
HTML5. The question is whether one should go even further than specs go.

For instance, it can be justified that a polyglot HTML5 document (in 
the strict spec-inferred sense) is often/sometimes _more_ HTML-parser 
compliant than a HTML5 document which doesn't use polyglot mark up. 
Simply because omitting tags that HTML5 permits to be omitted, now and 
then reveal browser bugs.

> If the goal is to recount how to write legacy IE-safe HTML5, the 
> publication shouldn't pretend to be about polyglotness.

I believe that one of the motivations for Polyglot Markup, from Sam's 
point of view, 

> Please, please, don't write another Appendix C that conflates 
> incomplete and vague assertions about legacy browser behavior with 
> purported XML compatibility.

The claim that UTF-8 is the "best" encoding, is based on HTML5. (Can 
also be based on Canonical XML, I think.)

I think UTF-8 can be justified, if we say that Polyglot markup is 
supposed to define more_ than just the broadest common denominator for 
HTML5 and XHTML.

As for other assertations (than that about UTF-8): we only need to make 
sure that they are not vague and that they are complete, no?

[1] http://www.w3.org/mid/4C3F56AB.7030105@intertwingly.net
[2] http://www.w3.org/mid/4C3F72F9.7070105@intertwingly.net
-- 
leif halvard silli

Received on Monday, 19 July 2010 10:19:27 UTC