W3C home > Mailing lists > Public > public-html@w3.org > July 2010

Re: Polyglot Markup/XML encoding declaration

From: Leif Halvard Silli <xn--mlform-iua@xn--mlform-iua.no>
Date: Wed, 28 Jul 2010 20:17:58 +0300
To: Lachlan Hunt <lachlan.hunt@lachy.id.au>
Cc: HTMLwg <public-html@w3.org>, Eliot Graff <eliotgra@microsoft.com>, public-i18n-core@w3.org
Message-ID: <20100728201758147140.51583146@xn--mlform-iua.no>
Lachlan Hunt, Tue, 27 Jul 2010 13:21:03 +0200:
> On 2010-07-23 16:26, Leif Halvard Silli wrote:
>> Proposal: Polyglot Markup should allow the document encoding to be set
>> via the encoding attribute of the XML declaration. The XML declaration,
>> including the encoding attribute, thus becomes a HTML5 extension,
>> whenever polyglot markup is being consumed as HTML. (See my previous
>> letter to Sam, about the XML declaration as polyglot markup indicator.)
> 
> I object to this because permitting the XML declaration would only 
> serve to pollute the document with unnecessary markup,

A polyglot may be served as XHTML. XML 1.0 does not consider the XML 
declaration unnecessary pollution. There are several things in a 
polyglot that is unnecessary from a purist HTML point of view!

> and to mislead 
> authors about how the encoding of a file is actually determined.

I agree that it was bad of me to hint that a HTML consumed file should 
be able to rely on the XML encoding declaration only.  To remove any 
doubt, I emphasize - stronger - that if the XML encoding declaration is 
used, then the HTML encoding declaration - meta@charset - must also be 
used.

In detail, the following rules:

1) <meta charset="*"/> is recommended, but optional, as in HTML5.
   (Eventually, special rules for UTF-16.)
2) <?xml version="1.0" ?> without encoding declaration is 
   recommended, but optional, as in XML 1.0.
3) <?XML version="1.0" encoding="*" ?> (with encoding declaration)
   is required - *together* with meta@charset! - under the same 
   conditions as in XML 1.0. That is: when no higher protocol 
   (e.g. HTTP) informs about the encoding *and* the document 
   encoding is neither UTF-16 nor UTF-8.

> There have been many observed instances of otherwise useless markup 
> being used by misled authors in ways that don't actually do anything. 
> Many of these cases have now been made optional or obsolete in HTML5 
> because of the wasted effort they were causing, and so introducing 
> new markup with no real purpose would not be wise.

It is permitted in HTML already, under XHTML 1.0, Appendix C.

You forgot to say that HTML5 also allows otherwise useless markup in 
XHTML because it is useful in HTML - and vice-versa.

The XML declaration would not be generally permitted in HTML - it would 
only be permitted in polyglot markup. 
-- 
leif h silli
Received on Thursday, 29 July 2010 12:51:58 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:19 UTC