RE: i18n comments on Polyglot Markup [issue #4]

Henry,  I believe it was raising this bug that originally alerted the i18n WG to the fact that the spec currently forbids charset=utf-16. The i18n WG is not trying to get around any procedures. There has simply been a disconnect.

Eliot, thanks for your work on this, but I think we're going to have to await the result of before we can finalise this passage.

Btw, I am surmising that the intention here is that the XML declaration must not be used in polyglot documents.  I think that if that is the case, you should probably state it clearly here.  It's not needed for utf-8 XML documents, but it's not forbidden either.

(That said, I ought to state for the record that I'm still don't feel particularly comfortable about constraining polyglot documents to require you to add a BOM in UTF-8 documents and remove any XML declarations.  I guess I don't yet understand why it's so hard for an HTML parser to recognise an XML declaration for what it is and treat it appropriately, rather than assume that it is a processing instruction. I know that an HTML parser has nothing to do with XML declarations, and so in terms of language purity it doesn't belong, but the HTML5 spec currently recognises xml:lang and xmlns attributes and handles them, why can't we write the spec to do a similar thing with the XML declaration?)


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

> -----Original Message-----
> From: [mailto:public-i18n-core-
>] On Behalf Of Richard Ishida
> Sent: 07 October 2010 19:27
> To: 'Henri Sivonen'; 'Eliot Graff'
> Cc:;
> Subject: RE: i18n comments on Polyglot Markup [issue #4]
> [forwarding to public-i18n-core, so they are kept in the loop.  Please reply to
> this email, rather than the previous one.]
> > From: Henri Sivonen []
> > Sent: 04 October 2010 12:53
> > To: Eliot Graff
> > Cc: Richard Ishida;
> > Subject: Re: i18n comments on Polyglot Markup [issue #4]
> > Importance: High
> >
> > > In
> > > addition, the meta tag may be used in the absence of a BOM as long as
> > > it matches the already specified encoding. Note that the W3C
> > > Internationalization (i18n) Group recommends to always include a
> > > visible encoding declaration in a document, because it helps
> > > developers, testers, or translation production managers to check the
> > > encoding of a document visually.
> >
> > I object to the polyglot markup doc saying that things are permitted when
> > HTML5 says they aren't permitted. HTML5 doesn't permit <meta
> > charset="UTF-16">. If the i18n group wishes to change that, the
> procedurally
> > proper way is to escalate
> > once it has been
> > WONTFIXed (and I expect it to be WONTFIXed)--not to try to get the
> polyglot
> > markup doc changed ahead of the spec.
> >
> > (Of course, I'd prefer
> > to be WONTFIXed and the i18n group not escalating it.)
> >
> > --
> > Henri Sivonen
> >
> >
> No virus found in this incoming message.
> Checked by AVG -
> Version: 9.0.862 / Virus Database: 271.1.1/3181 - Release Date: 10/06/10
> 19:34:00

Received on Thursday, 7 October 2010 19:40:12 UTC