W3C home > Mailing lists > Public > public-html-bugzilla@w3.org > December 2011

[Bug 15142] Define "UNICODE" as a defacto alias for "UTF-16"

From: <bugzilla@jessica.w3.org>
Date: Sun, 11 Dec 2011 16:53:18 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RZmeM-0008FW-Iv@jessica.w3.org>

--- Comment #7 from Julian Reschke <julian.reschke@gmx.de> 2011-12-11 16:53:18 UTC ---
(In reply to comment #5)

> > Point taken, but not convinced. For all practical purposes, UTF-8 is defined by
> > RFC 3629. That's where people look. Also, RFC 3629 doesn't even link to another
> > definition. So where is the definition by the Unicode consortium, and why isn't
> > it referenced?
> Did you read the first paragraph in RFC 3629 Section 3 [1] (which I quoted
> above)?

Yes, "Unicode" is mentioned, but there's no reference that takes me to the
actual definition.

In the meantime I noticed that UTF-8 is indeed defined in
<http://unicode.org/versions/Unicode5.2.0/ch03.pdf>, and I believe it would be
good to add an erratum to RFC 3629 pointing out that a revision should actually
*reference* the Unicode definition.

> > Also, a more general point: I would hope that all future definitions of
> > character encoding schemes in the IANA registry are based on the Unicode code
> > points, even those which can not represent all code points. The procedure for
> > IANA charset registrations is in IETF BCP 19, which doesn't even mention
> > Unicode, as far as I can tell.
> Different national administrations have different priorities. There will always
> remain character encodings not based on the Unicode Character Set, for legacy
> reasons if no others. 
> The Unicode Consortium does not maintain a character encoding scheme registry.
> IANA does. However, the Unicode Consortium does own the term "UNICODE", so if
> someone wishes to register this term as a charset value, they need to take it
> up with the Unicode Consortium, and not with the HTML WG. But I would suggest
> they would be wasting their time, since it is extremely unlikely the Unicode
> Consortium would choose to enter such registration (for some of the reasons I
> have cited as well as others).

I agree that HTML is the wrong place to start. The registry is maintained by
IANA, and how to get values into the registries is defined by an IETF BCP. I
don't see a requirement to go through the Unicode Consortium.

That being said, I do agree that using the string "Unicode" as character
encoding scheme name is a bad idea. I'm not sure about "ownership" of names
though, if IANA would need to reject any registration for a "charset" name
where somebody claims to "own" the name, the whole process might get very
complicated :-).

Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.
Received on Sunday, 11 December 2011 16:53:21 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 20:02:10 UTC