W3C home > Mailing lists > Public > public-html@w3.org > October 2010

RE: i18n comments on Polyglot Markup [issue #4]

From: Richard Ishida <ishida@w3.org>
Date: Thu, 7 Oct 2010 21:07:10 +0100
To: "'Eliot Graff'" <eliotgra@microsoft.com>, "'Henri Sivonen'" <hsivonen@iki.fi>
Cc: <public-html@w3.org>, <public-i18n-core@w3.org>
Message-ID: <015901cb665b$3a463d90$aed2b8b0$@org>
Hmm.  Hold on, perhaps I misunderstood something here...

The spec says:

" Polyglot markup declares character encoding one of two ways:

    * By using the BOM.
    * In the HTTP header of the response [HTTP11], ..." 

I took this to mean that a BOM is required for UTF-8 encoded documents, but I guess that's a misinterpretation. What I now think you're saying, Eliot, is that if you apply a declaration on a polyglot document, then it must be in one of these two ways, but you are leaving unsaid that if you have a UTF-8 document you can leave it without a declaration altogether.  Is that right?

RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/




> -----Original Message-----
> From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> request@w3.org] On Behalf Of Richard Ishida
> Sent: 07 October 2010 20:40
> To: 'Eliot Graff'; 'Henri Sivonen'
> Cc: public-html@w3.org; public-i18n-core@w3.org
> Subject: RE: i18n comments on Polyglot Markup [issue #4]
> 
> Henry,  I believe it was raising this bug that originally alerted the i18n WG to
> the fact that the spec currently forbids charset=utf-16. The i18n WG is not
> trying to get around any procedures. There has simply been a disconnect.
> 
> Eliot, thanks for your work on this, but I think we're going to have to await
> the result of http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890 before
> we can finalise this passage.
> 
> Btw, I am surmising that the intention here is that the XML declaration must
> not be used in polyglot documents.  I think that if that is the case, you should
> probably state it clearly here.  It's not needed for utf-8 XML documents, but
> it's not forbidden either.
> 
> (That said, I ought to state for the record that I'm still don't feel particularly
> comfortable about constraining polyglot documents to require you to add a
> BOM in UTF-8 documents and remove any XML declarations.  I guess I don't
> yet understand why it's so hard for an HTML parser to recognise an XML
> declaration for what it is and treat it appropriately, rather than assume that it
> is a processing instruction. I know that an HTML parser has nothing to do
> with XML declarations, and so in terms of language purity it doesn't belong,
> but the HTML5 spec currently recognises xml:lang and xmlns attributes and
> handles them, why can't we write the spec to do a similar thing with the XML
> declaration?)
> 
> Cheers,
> RI
> 
> ============
> Richard Ishida
> Internationalization Lead
> W3C (World Wide Web Consortium)
> 
> http://www.w3.org/International/
> http://rishida.net/
> 
> 
> 
> 
> > -----Original Message-----
> > From: public-i18n-core-request@w3.org [mailto:public-i18n-core-
> > request@w3.org] On Behalf Of Richard Ishida
> > Sent: 07 October 2010 19:27
> > To: 'Henri Sivonen'; 'Eliot Graff'
> > Cc: public-html@w3.org; public-i18n-core@w3.org
> > Subject: RE: i18n comments on Polyglot Markup [issue #4]
> >
> > [forwarding to public-i18n-core, so they are kept in the loop.  Please reply
> to
> > this email, rather than the previous one.]
> >
> >
> > > From: Henri Sivonen [mailto:hsivonen@iki.fi]
> > > Sent: 04 October 2010 12:53
> > > To: Eliot Graff
> > > Cc: Richard Ishida; public-html@w3.org
> > > Subject: Re: i18n comments on Polyglot Markup [issue #4]
> > > Importance: High
> > >
> > > > In
> > > > addition, the meta tag may be used in the absence of a BOM as long
> as
> > > > it matches the already specified encoding. Note that the W3C
> > > > Internationalization (i18n) Group recommends to always include a
> > > > visible encoding declaration in a document, because it helps
> > > > developers, testers, or translation production managers to check the
> > > > encoding of a document visually.
> > >
> > > I object to the polyglot markup doc saying that things are permitted
> when
> > > HTML5 says they aren't permitted. HTML5 doesn't permit <meta
> > > charset="UTF-16">. If the i18n group wishes to change that, the
> > procedurally
> > > proper way is to escalate
> > > http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890 once it has been
> > > WONTFIXed (and I expect it to be WONTFIXed)--not to try to get the
> > polyglot
> > > markup doc changed ahead of the spec.
> > >
> > > (Of course, I'd prefer
> > http://www.w3.org/Bugs/Public/show_bug.cgi?id=10890
> > > to be WONTFIXed and the i18n group not escalating it.)
> > >
> > > --
> > > Henri Sivonen
> > > hsivonen@iki.fi
> > > http://hsivonen.iki.fi/
> >
> >
> >
> > No virus found in this incoming message.
> > Checked by AVG - www.avg.com
> > Version: 9.0.862 / Virus Database: 271.1.1/3181 - Release Date: 10/06/10
> > 19:34:00
> 
> 
> 
> No virus found in this incoming message.
> Checked by AVG - www.avg.com
> Version: 9.0.862 / Virus Database: 271.1.1/3181 - Release Date: 10/06/10
> 19:34:00
Received on Thursday, 7 October 2010 20:07:46 UTC

This archive was generated by hypermail 2.3.1 : Monday, 29 September 2014 09:39:20 UTC