- From: Martin Duerst <duerst@w3.org>
- Date: Wed, 05 Jun 2002 14:08:47 +0900
- To: jim.melton@acm.org (Jim Melton), www-i18n-comments@w3.org
- Cc: w3c-i18n-ig@w3.org, w3c-xml-query-wg@w3.org
Hello Jim, dear XML Query WG,
We discussed this comment of your at our teleconference
yesterday, and I was actioned to convey our decision to you.
At 18:39 02/05/31 +0900, Jim Melton wrote:
>This is a last call comment from Jim Melton (jim.melton@acm.org) on
>the Character Model for the World Wide Web 1.0
>(http://www.w3.org/TR/2002/WD-charmod-20020430/).
>
>Semi-structured version of the comment:
>
>Submitted by: Jim Melton (jim.melton@acm.org)
>Submitted on behalf of (maybe empty): W3C XML Query Working Group
>Comment type: editorial
>Chapter/section the comment applies to: 3.2 Digital Encoding of Characters
>The comment will be visible to: public
>Comment title: proprietary charset identifiers
>Comment:
>Section 3.2, "Digital Encoding of Characters", list element 4, contains
>the phrase "... is identified by an IANA charset identifier."
>
>In fact, there are a great many CESes that are identified by charset
>identifiers that are not assigned by IANA at all, but that are "created"
>by proprietary means (e.g., corporations). The Character Model
>specification must not prohibit the use of CESes identified by charset
>identifiers assigned through other means.
>
>To correct this, simply change "...is identified by an IANA charset
>identifier." to "...is identified by a unique identifier, such as an IANA
>charset identifier."
However, working on the details today, I discovered that
it may be better to request a clarification from you first.
You request that section 3.2 mentions other identifiers for
character encodings than those registered by IANA. But
Section 3.2 just mentions the labels as part of the overall
model. Details of what encodings to use or not to use,
and what labels to use for them, are given in Section 3.6.2
(http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-EncodingIdent).
Section 3.6.2 also has a very strong emphasis on IANA labels,
because using labels from a single registry is the only way
to avoid conflicts, and the IANA registry is the registry
used on the Internet (and the Web is part of the Internet).
Given this, can you please clarify whether the Query WG meant that:
a) changing "...is identified by an IANA charset identifier." to
"...is identified by a unique identifier, such as an IANA
charset identifier." is appropriate in Section 3.2 because
this is a general discussion, and any set of unique identifiers
could do, and specifics are discussed in 3.6.2.
b) The change was intended to make sure that encoding identifiers
other than those registered with IANA would conform to the
character model; Section 3.6.2 would have to be changed, too.
Yesterday, we forgot about 3.6.2, but assumed the intent of b).
If b) is your intent, please find our answer below. If your intent
was a), or something else, we will have to reconsider your comment.
<assumption value='b)'>
First, please note that your classification of this comment was
'editorial', but we have decided to reclassify it as 'substantial'.
Second, we have decided to reject this comment, based on the
following reasons:
- IANA charset identifiers (except for those starting with x-) are
guaranteed to be unique. Adding any other set(s) of identifiers
to the IANA identifiers very quickly removes this guarantee.
Because of that, your proposed change can either be seen as an
unnecessary addition, putting in more words but, under careful
analysis, not saying anything different, or it can be misunderstood
by readers to guarantee some uniqueness when indeed such a guarantee
is not possible.
[If you know about some trick to guarantee uniqueness among different
sets of identifiers, then we sure would like to know.]
- IANA does not 'assign' identifiers, it just registers them.
Anybody can apply for registration. A few years ago, there has been
a tendency to restrict registration to widely used/usable encodings,
but this lead to the defacto use of many unregistered encodings
with an x- prefix. Registration practice has changed to be very
liberal now, while making sure that each registration notes duly
whether the encoding in practice is suitable for the use on the
Internet at large. If any corporation represented in the XML
Query WG or elsewhere uses encodings that are not registered
with IANA, we strongly recommend to register them.
- The IANA registry already contains registrations for many (some
even say too many) proprietary encodings. Indeed, the majority
of encodings registered are proprietary encodings rather than
encodings defined by standards organizations. There is quite
some chance that your encoding is already registered. Please check.
- The IANA registry already contains many (some even say too many)
aliases for most encodings. There is quite some chance that the
identifier used inside your corporation is already an alias.
Please tell us, at your earliest convenience, whether you are
satisfied with our decision or not. If not, please provide
additional rationale.
</assumption>
If there are any questions or comments, please don't hesitate
to contact us again.
Regards, Martin.
Received on Wednesday, 5 June 2002 01:09:10 UTC