W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > November 2010

[Bug 11423] Character sets not registered with IANA

From: <bugzilla@jessica.w3.org>
Date: Mon, 29 Nov 2010 01:00:08 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1PMs6C-00079K-Tl@jessica.w3.org>
http://www.w3.org/Bugs/Public/show_bug.cgi?id=11423

--- Comment #3 from Benjamin Hawkes-Lewis <bhawkeslewis@googlemail.com> 2010-11-29 01:00:08 UTC ---
(In reply to comment #2): 
> EUC-KR and KS_C_5601-1987 are mapped onto windows-949.  I think a "must"
> directive is definitely an encouragement, even if you don't.

Oh, I thought by "encouraging" you referred to things a spec can realistically
influence (like future authoring) as opposed to UA behaviour required to
process an existing web corpus. HTML5 can't retrospectively change the corpus.

Anyhow, as I read the spec, a conforming UA is free to fail to process
documents labeled as EUC-KR and KS_C_5601-1987 on the basis that HTML5 maps
them to Windows-949 for backwards compatibility with the web corpus, and it
happens not to support Windows-949.

> > > It's not like registering a character set with IANA is a particularly difficult or drawn-out process
> > 
> > And yet Microsoft's attempt to do so (back in 2005) seems to have failed:
> > 
> > http://mail.apps.ietf.org/ietf/charsets/msg01510.html
> 
> Probably because, as the responses indicate, the specifications for those
> character sets were insufficient and contradictory.  It doesn't matter what
> exactly the reason is; it's not registered. HP, IBM, and Adobe have managed to
> do it, so I'm sure that it's not impossible or unreasonably difficult.

Big Blue managed to do it, so it's easy? Your standard of proof may be lower
than mine here. ;)

> I believe "if there is one" means "if there is a name or alias labeled as
> 'preferred MIME name'", not "if there is an entry in the IANA Character Sets
> registry".

Hmm. I think your reading is correct. :(

> Even if we were to use your suggested interpretation, there are
> other names for this character set, such as "CP949".  How are we to know what
> the preferred name is if it's not IANA-registered?
> 
> > "User agents must at a minimum support the UTF-8 and Windows-1252 encodings,
> > but may support more."
> 
> Right, but if they support EUC-KR or KS_C_5601-1987, they are effectively
> required to.  (Actually, the spec seems to prohibit the useful implementation
> of EUC-KR, since it's mandated that user agents use something else instead.)

The spec effectively:

   - prohibits implementing EUC-KR or KS_C_5601-1987;
   - allows implementing Windows-949;
   - requires mapping of EUC-KR or KS_C_5601-1987 to Windows-949, but does not
require UAs to actually process such documents.

> > > I must therefore object to suggesting or encouraging the use of windows-949
> > > until it has been registered appropriately with IANA.
> > 
> > Maybe try registering it? Perhaps you'll have better luck than Microsoft.
> 
> I'm really not interested in registering what amount to platform-specific
> character sets. 

Assuming you're interested in user agents being able to process the existing
web corpus using only IANA-registered characters sets, you perhaps should have
some level of interest in doing so. ;)

> Finally, there are numerous character sets in existence that handle Korean just fine,
> including UTF-8, and I don't see the need to add more.

Which is why the spec recommends authors use UTF-8. :)

http://msdn.microsoft.com/en-gb/goglobal/cc305154.aspx (which the spec
references) defines an authoritative mapping of windows-949 to Unicode.

If the spec simply defined the preferred name of Windows-949 as
(case-insensitive) "Windows-949", could we close this bug?

-- 
Configure bugmail: http://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Monday, 29 November 2010 01:00:10 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 29 November 2010 01:00:26 GMT