W3C home > Mailing lists > Public > public-html@w3.org > October 2010

RE: i18n comments on Polyglot Markup [issue #4]

From: Richard Ishida <ishida@w3.org>
Date: Thu, 7 Oct 2010 20:39:37 +0100
To: "'Eliot Graff'" <eliotgra@microsoft.com>, "'Henri Sivonen'" <hsivonen@iki.fi>
Cc: <public-html@w3.org>, <public-i18n-core@w3.org>
Message-ID: <014f01cb6657$60c210b0$22463210$@org>
Henry,  I believe it was raising this bug that originally alerted the i18n WG to the fact that the spec currently forbids charset=utf-16. The i18n WG is not trying to get around any procedures. There has simply been a disconnect.

Eliot, thanks for your work on this, but I think we're going to have to await the result of http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890 before we can finalise this passage.

Btw, I am surmising that the intention here is that the XML declaration must not be used in polyglot documents.  I think that if that is the case, you should probably state it clearly here.  It's not needed for utf-8 XML documents, but it's not forbidden either.

(That said, I ought to state for the record that I'm still don't feel particularly comfortable about constraining polyglot documents to require you to add a BOM in UTF-8 documents and remove any XML declarations.  I guess I don't yet understand why it's so hard for an HTML parser to recognise an XML declaration for what it is and treat it appropriately, rather than assume that it is a processing instruction. I know that an HTML parser has nothing to do with XML declarations, and so in terms of language purity it doesn't belong, but the HTML5 spec currently recognises xml:lang and xmlns attributes and handles them, why can't we write the spec to do a similar thing with the XML declaration?)


Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)


> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of Richard Ishida
> Sent: 07 October 2010 19:27
> To: 'Henri Sivonen'; 'Eliot Graff'
> Cc: public-html@w3.org; public-i18n-core@w3.org
> Subject: RE: i18n comments on Polyglot Markup [issue #4]
> [forwarding to public-i18n-core, so they are kept in the loop.  Please reply to
> this email, rather than the previous one.]
> > From: Henri Sivonen [mailto:hsivonen@iki.fi]
> > Sent: 04 October 2010 12:53
> > To: Eliot Graff
> > Cc: Richard Ishida; public-html@w3.org
> > Subject: Re: i18n comments on Polyglot Markup [issue #4]
> > Importance: High
> >
> > > In
> > > addition, the meta tag may be used in the absence of a BOM as long as
> > > it matches the already specified encoding. Note that the W3C
> > > Internationalization (i18n) Group recommends to always include a
> > > visible encoding declaration in a document, because it helps
> > > developers, testers, or translation production managers to check the
> > > encoding of a document visually.
> >
> > I object to the polyglot markup doc saying that things are permitted when
> > HTML5 says they aren't permitted. HTML5 doesn't permit <meta
> > charset="UTF-16">. If the i18n group wishes to change that, the
> procedurally
> > proper way is to escalate
> > http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890 once it has been
> > WONTFIXed (and I expect it to be WONTFIXed)--not to try to get the
> polyglot
> > markup doc changed ahead of the spec.
> >
> > (Of course, I'd prefer
> http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890
> > to be WONTFIXed and the i18n group not escalating it.)
> >
> > --
> > Henri Sivonen
> > hsivonen@iki.fi
> > http://hsivonen.iki.fi/
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.862 / Virus Database: 271.1.1/3181 - Release Date: 10/06/10
> 19:34:00
Received on Thursday, 7 October 2010 19:40:13 UTC

This archive was generated by hypermail 2.3.1 : Thursday, 29 October 2015 10:16:05 UTC