- From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
- Date: Thu, 15 Jul 2010 23:57:49 +0400
- To: Edward O'Connor <hober0@gmail.com>
- Cc: Sam Ruby <rubys@intertwingly.net>, Anne van Kesteren <annevk@opera.com>, Richard Ishida <ishida@w3.org>, public-html@w3.org
Edward O'Connor, Thu, 15 Jul 2010 12:04:45 -0700: > Sam Ruby: >> We could also go a different way entirely, and say that polyglot documents >> are a subset of both HTML5 and XHTML5, and the subset that we select >> is only >> utf-8. > > I think this is the way to go. I suspect 90+% of the usefulness of the > polyglot spec is its usefulness as a best practice style guide for > people producing HTML content, and <10% as an additional means for > allowing people to use XML toolchains. Always using UTF-8 is such a > best practice. I agree that it possible to come to the conclusion that Polyglot Markup should be based on UTF-8 from two angles: We can say that this is a spec of its own, and, based on that - say that we decide the spec rules. Or we can treat Polyglot Markup as a best practice document, and rule that UTF-8 is the best practice. (We then also make polyglot syntax as such as a kind of best practice, too, I think.) My attitude when filing bugs etc against Polyglot Markup, has been more or less based on what Henri described in a bug comment - from memory: Polyglot Markup should be a common denominator of the XML spec and the HTML spec. And I find no reason for forbidding UTF-16 in Polyglot Markup whether in the XML and HTML5 specs. But even if we say that "we decide the rules", I still think that Polyglot Markup needs some additional principle in order to justify UTF-8 as the sole encoding. From memory, HTML5's recommendation of UTF-8 is related to URLs and form handling. And those are also, I guess, reasons for preferring UTF-8 in a XHTML document - the consequences of an invalid character in XHTML are only more draconian than they are in HTML ... Else the issue is the same. I could think of the following justification, then: Due to the often more draconian consequences of malformed characters in XHTML, it is recommended/required to use UTF-8, as UTF-8 diminishes chances for malformed characters in forms etc. In other words: in order to be more compatible with HTML, then a polyglot served/parsed as XHTML needs to be served as UTF-8, to diminish the possibility that a polyglot parsed as XHTML becomes more inaccessible (due to malformed forms input) than the same polyglot would be when parsed as HTML. Does this make sense to anyone? -- leif h silli
Received on Thursday, 15 July 2010 19:59:18 UTC