Re: proprietary charset identifiers from Martin Duerst on 2002-06-05 (www-i18n-comments@w3.org from June 2002)

From: Martin Duerst <duerst@w3.org>
Date: Wed, 05 Jun 2002 14:08:47 +0900
To: jim.melton@acm.org (Jim Melton), www-i18n-comments@w3.org
Cc: w3c-i18n-ig@w3.org, w3c-xml-query-wg@w3.org
Message-Id: <4.2.0.58.J.20020605134655.04283788@localhost>
Hello Jim, dear XML Query WG,

We discussed this comment of your at our teleconference
yesterday, and I was actioned to convey our decision to you.

At 18:39 02/05/31 +0900, Jim Melton wrote:

>This is a last call comment from Jim Melton (jim.melton@acm.org) on
>the Character Model for the World Wide Web 1.0
>(http://www.w3.org/TR/2002/WD-charmod-20020430/).
>
>Semi-structured version of the comment:
>
>Submitted by: Jim Melton (jim.melton@acm.org)
>Submitted on behalf of (maybe empty): W3C XML Query Working Group
>Comment type: editorial
>Chapter/section the comment applies to: 3.2 Digital Encoding of Characters
>The comment will be visible to: public
>Comment title: proprietary charset identifiers
>Comment:
>Section 3.2, "Digital Encoding of Characters", list element 4, contains 
>the phrase "... is identified by an IANA charset identifier."
>
>In fact, there are a great many CESes that are identified by charset 
>identifiers that are not assigned by IANA at all, but that are "created" 
>by proprietary means (e.g., corporations).  The Character Model 
>specification must not prohibit the use of CESes identified by charset 
>identifiers assigned through other means.
>
>To correct this, simply change "...is identified by an IANA charset 
>identifier." to "...is identified by a unique identifier, such as an IANA 
>charset identifier."

However, working on the details today, I discovered that
it may be better to request a clarification from you first.

You request that section 3.2 mentions other identifiers for
character encodings than those registered by IANA. But
Section 3.2 just mentions the labels as part of the overall
model. Details of what encodings to use or not to use,
and what labels to use for them, are given in Section 3.6.2
(http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-EncodingIdent).

Section 3.6.2 also has a very strong emphasis on IANA labels,
because using labels from a single registry is the only way
to avoid conflicts, and the IANA registry is the registry
used on the Internet (and the Web is part of the Internet).

Given this, can you please clarify whether the Query WG meant that:

a) changing "...is identified by an IANA charset identifier." to
    "...is identified by a unique identifier, such as an IANA
    charset identifier." is appropriate in Section 3.2 because
    this is a general discussion, and any set of unique identifiers
    could do, and specifics are discussed in 3.6.2.

b) The change was intended to make sure that encoding identifiers
    other than those registered with IANA would conform to the
    character model; Section 3.6.2 would have to be changed, too.

Yesterday, we forgot about 3.6.2, but assumed the intent of b).
If b) is your intent, please find our answer below. If your intent
was a), or something else, we will have to reconsider your comment.


<assumption value='b)'>
First, please note that your classification of this comment was
'editorial', but we have decided to reclassify it as 'substantial'.

Second, we have decided to reject this comment, based on the
following reasons:

- IANA charset identifiers (except for those starting with x-) are
   guaranteed to be unique. Adding any other set(s) of identifiers
   to the IANA identifiers very quickly removes this guarantee.
   Because of that, your proposed change can either be seen as an
   unnecessary addition, putting in more words but, under careful
   analysis, not saying anything different, or it can be misunderstood
   by readers to guarantee some uniqueness when indeed such a guarantee
   is not possible.
   [If you know about some trick to guarantee uniqueness among different
    sets of identifiers, then we sure would like to know.]

- IANA does not 'assign' identifiers, it just registers them.
   Anybody can apply for registration. A few years ago, there has been
   a tendency to restrict registration to widely used/usable encodings,
   but this lead to the defacto use of many unregistered encodings
   with an x- prefix. Registration practice has changed to be very
   liberal now, while making sure that each registration notes duly
   whether the encoding in practice is suitable for the use on the
   Internet at large. If any corporation represented in the XML
   Query WG or elsewhere uses encodings that are not registered
   with IANA, we strongly recommend to register them.

- The IANA registry already contains registrations for many (some
   even say too many) proprietary encodings. Indeed, the majority
   of encodings registered are proprietary encodings rather than
   encodings defined by standards organizations. There is quite
   some chance that your encoding is already registered. Please check.

- The IANA registry already contains many (some even say too many)
   aliases for most encodings. There is quite some chance that the
   identifier used inside your corporation is already an alias.

Please tell us, at your earliest convenience, whether you are
satisfied with our decision or not. If not, please provide
additional rationale.
</assumption>


If there are any questions or comments, please don't hesitate
to contact us again.

Regards,    Martin.
Received on Wednesday, 5 June 2002 01:09:10 UTC