[Bug 15142] Define "UNICODE" as a defacto alias for "UTF-16" from bugzilla@jessica.w3.org on 2011-12-11 (public-html-bugzilla@w3.org from December 2011)

From: <bugzilla@jessica.w3.org>
Date: Sun, 11 Dec 2011 16:41:17 +0000
To: public-html-bugzilla@w3.org
Message-Id: <E1RZmSj-00081y-VQ@jessica.w3.org>

https://www.w3.org/Bugs/Public/show_bug.cgi?id=15142

--- Comment #5 from Glenn Adams <glenn@skynav.com> 2011-12-11 16:41:17 UTC ---
(In reply to comment #4)
> (In reply to comment #3)
> > a historical circumstance... in the 1992-93 time frame, ISO SC2/WG2 first
> > proposed UTF-1 as a transformation encoding of ISO/IEC 10646 UCS-4;  although
> > UTF-1 never caught on, the more efficient alternative, UTF-8, came out of work
> > started at X/Open and concluded at Bell Labs in Plan 9;
> > 
> > later, the Unicode Standard incorporated the normative definition of UTF-8 into
> > The Unicode Standard;
> > the current IETF RFC 3629 (STD 63) [1] refers to the Unicode Standard for the
> > formal definition of UTF-8:
> > 
> > 3.  UTF-8 definition
> > 
> >    UTF-8 is defined by the Unicode Standard [UNICODE].  Descriptions and
> >    formulae can also be found in Annex D of ISO/IEC 10646-1 [ISO.10646]
> > 
> > [1] http://tools.ietf.org/html/rfc3629#section-3
> > 
> > glenn
> 
> Point taken, but not convinced. For all practical purposes, UTF-8 is defined by
> RFC 3629. That's where people look. Also, RFC 3629 doesn't even link to another
> definition. So where is the definition by the Unicode consortium, and why isn't
> it referenced?

Did you read the first paragraph in RFC 3629 Section 3 [1] (which I quoted
above)?

> Also, a more general point: I would hope that all future definitions of
> character encoding schemes in the IANA registry are based on the Unicode code
> points, even those which can not represent all code points. The procedure for
> IANA charset registrations is in IETF BCP 19, which doesn't even mention
> Unicode, as far as I can tell.

Different national administrations have different priorities. There will always
remain character encodings not based on the Unicode Character Set, for legacy
reasons if no others. 

The Unicode Consortium does not maintain a character encoding scheme registry.
IANA does. However, the Unicode Consortium does own the term "UNICODE", so if
someone wishes to register this term as a charset value, they need to take it
up with the Unicode Consortium, and not with the HTML WG. But I would suggest
they would be wasting their time, since it is extremely unlikely the Unicode
Consortium would choose to enter such registration (for some of the reasons I
have cited as well as others).

-- 
Configure bugmail: https://www.w3.org/Bugs/Public/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the QA contact for the bug.

Received on Sunday, 11 December 2011 16:41:20 UTC