Comments to draft-freed-charset-reg-04.txt from Martin J. Dürst on 1997-10-13 (ietf-charsets@w3.org from October to December 1997)

From: Martin J. Dürst <mduerst@ifi.unizh.ch>
Date: Mon, 13 Oct 1997 18:43:20 +0100 (MET)
To: ietf-charsets@INNOSOFT.COM
Message-id: <Pine.SUN.3.96.971013174322.12334E-100000@enoshima.ifi.unizh.ch>
Hello everybody,

Here some comments to draft-freed-charset-reg-04.txt.
In general, extremely good work. Terminology is much
better than in previous editions.

A few nits follow:



> Network Working Group                      Ned Freed, Innosoft
> Internet Draft                                 Jon Postel, ISI
>                               <draft-freed-charset-reg-04.txt>
> 
>                          IANA Charset
>                    Registration Procedures
> 
>                         September 1997

> 1.  Abstract
> 
> MIME [RFC-2045, RFC-2046, RFC-2047, RFC-2184] and various
> other modern Internet protocols are capable of using many
> different charsets. This in turn means that the ability to
> label different charsets is essential. This registration
> procedure exists solely to associate a specific name or names
> with a given charset and to give an indication of whether or
> not a given charset can be used in MIME text objects. In
> particular, the general applicability and appropriateness of a
> given registered charset is a protocol issue, not a
> registration issue, and is not dealt with by this registration
> procedure.

MIME is cited in full; maybe it would be nice to have some
other references for other Internet protocols. The most
important ones I know (HTTP,...) do so by reference to
MIME, but there may be others.


> 2.2.  Character
> 
> A member of a set of elements used for the organisation,
> control, or representation of data.

Verbatim from ISO standards. A refenence might help
people to understand that this definition is the same
as the ISO definition.


> 2.3.  Charset
> 
> The term "charset" (referred to as a "character set" in
> previous versions of this document) is used here to refer to a
> method of converting a sequence of octets into a sequence of
> characters. This conversion may also optionally produce
> additional control information such as directionality
> indicators.

Very good to use "Charset" throughout the document!
Many people, esp. those with a math background, had
problems with "character set".


> This definition is intended to allow charsets to be defined in
> a variety of different ways, from simple single-table mappings
> such as US-ASCII to complex table switching methods such as
> those that use ISO 2022's techniques, to be used as charsets.

The last clause seems superfluos. We can speak about definition
or use; it's the same here, because it would be strange if we
could define things we couldn't use or vice versa.


> 2.5.  Character Encoding Scheme
> 
> A Character Encoding Scheme (CES) is a mapping from a Coded
> Character Set or several coded character sets to a set of
> octets. A given CES is typically associated with a single CCS;
> for example, UTF-8 applies only to ISO 10646.

I would change "is typically" to "may be". For the ISO-8859
series, the "8bit" mapping is associated with many CSS.

If we take the CSS as part of the CES, then "typically" is
also not adequate, because a given CES is then always
associated to a single CSS (or a fixed set of CSSs).


> 3.1.  Required Characteristics
> 
> Registered charsets MUST conform to the definition of a
> "charset" given above.  In addition, charsets intended for use
> in MIME content types under the "text" top-level type must
> conform to the restrictions on that type described in RFC
> 2045. All registered charsets MUST note whether or not they
> are suitable for use in MIME.

This is MIME as defined in RFC 2045. Other protocols, for
better or for worse, in particular HTTP, have made wide
use of MIME while relaxing certain restrictions for the
"text" top level type. A note may be in order, it may
read e.g.:

Note: MIME is used with slightly changed requirements in
some protocols (e.g. [HTTP]). The note "suitable for use
in MIME" reflects MIME exactly as defined in RFC 2045,
without any such changes.


> All charsets which are constructed as a composition of a CCS
> and a CES MUST either include the CCS and CES they are based
> on in their registration or else cite a definition of their
> CCS and CES that appears elsewhere.

change "a CCS" to "one or more CCS". And because a CCS is a
relation from characters to integers, this means that ALL
charsets are compositions of CCS and CES.


> All registered charsets MUST be specified in a stable, openly
> available specification. Registration of charsets whose
> specifications aren't stable and openly available is
> forbidden.

What does "stable" mean?????


> 3.3.  Naming Requirements
> 
> One or more names MUST be assigned to all registered charsets.
> Multiple names for the same charset are permitted, but if
> multiple names are assigned a single primary name for the
> charset MUST be identified. All other names are considered to
> be aliases for the primary name and use of the primary name is
> preferred over use of any of the aliases.

The current IANA registry contains 
	name:
entries,
	alias:
entries, and also remarks in case of some aliases about
what is preferred by MIME. It would be good to mention what
"MIME preferred" means; it means that although this is
an alias, it is preferred over the "name".


> Finally, charsets being registered for use with the "text"
> media type MUST have a primary name that conforms to the more
> restrictive syntax of the charset field in MIME encoded-words
> [RFC-2047, RFC-2184] and MIME extended parameter values [RFC-
> 2184]. A combined ABNF definition for such names is as
> follows:

>     mime-charset = 1*<Any CHAR except SPACE, CTLs, and cspecials>
> 
>     cspecials    = "(" / ")" / "<" / ">" / "@" / "," / ";" / ":" / "
>                    <"> / "/" / "[" / "]" / "?" / "." / "=" / "*"
> 
>     CHAR         =  <any ASCII character>        ; (  0-177,  0.-127.)
>     SPACE        =  <ASCII SP, space>            ; (     40,      32.)
>     CTL          =  <any ASCII control           ; (  0- 37,  0.- 31.)
>                      character and DEL>          ; (    177,     127.)

This is a very exact definition, but it's extra work for each
registrant (and IANA!) to figure out what the remaining set is.
Defining it as an addition, or giving the list of allowed
characters in text (e.g.: A-Z, a-z, 0-9, ....) would help
a lot.


> 3.4.  Functionality Requirement
> 
> Charsets must function as actual charsets: Registration of
> things that are better thought of as a transfer encoding, as a
> media type, or as a collection of separate entities of another
> type, is not allowed.  For example, although HTML could
> theoretically be thought of as a charset, it is really better
> thought of as a media type and as such it cannot be registered
> as a charset.

This makes a lot of sense. However, I wonder whether this
refers more to things such as entity and attribute names,
or whether it refers to character entities (&uuml;) and
numeric character references (&#1234;)?
Things like the later of course could be defined as charsets,
in my oppinion, but in case of HTML, this doesn't make much
sense because there would have to be a special registration
for each currently used charset.


> 4.1.  Present the Charset to the Community
> 
> Send the proposed charset registration to the "ietf-
> charsets@iana.org" mailing list.  This mailing list has been
> established for the sole purpose of reviewing proposed charset
> registrations.

"and discussing related topics"? Or will there be a separate
list for questions re. what exactly a specific registration/
name is supposed to mean, for updates to this and related
documents if they become necessary, and so on?


> 4.2.  Charset Reviewer
> 
> When the two week period has passed and the registration
> proposer is convinced that consensus has been achieved, the
> registration application should be submitted to IANA and the
> charset reviewer. The charset reviewer, who is appointed by
> the IETF Applications Area Director(s), either approves the
> request for registration or rejects it.  Rejection may occur
> because of significant objections raised on the list or
> objections raised externally.  If the charset reviewer
> considers the registration sufficiently important and
> controversial, a last call for comments may be issued to the
> full IETF. The charset reviewer may also recommend standards
> track processing (before or after registration) when that
> appears appropriate and the level of specification of the
> charset is adequate.

Do I understand this correctly: The "charset reviewer" is
kind of like the WG chair of charsets? Good idea. But the
title "reviewer" seems to also contain a considerable
direct responsibility. However, this is not menitonned
explicitly.


> Decisions made by the reviewer must be posted to the ietf-
> charsets mailing list within 14 days. Decisions made by the
> reviewer may be appealed to the IESG.

Does this mean that once the reviewer has decided, he/she has
14 days to post the result? This doesn't make much sense;
posting a decision is not that big a deal, I hope. What I guess
was inteded (and probably is needed) here is a specification
that the decision, both before and after last call, has to be
made *and* posted within X days (where two weeks is probably
too short, because a reviewer may be on holydays). How much
the reviewer spends of that time for deciding and how much
for posting is hopefully his/her own business.


> 4.3.  IANA Registration
> 
> Provided that the charset registration has either passed
> review or has been successfully appealed to the IESG, the IANA
> will register the charset, assign a MIBenum value, and make
> its registration available to the community.

Please add a short note that the registration should be anounced
on the ietf-charsets list.


> 5.  Location of Registered Charset List
> 
> Charset registrations will be posted in the anonymous FTP file
> "ftp://ftp.isi.edu/in-notes/iana/assignments/character-sets"

IANA has its own web site (only that when you access it, you
get the ISI home page, and first have to find out you have to
look for the IANA under services). As we specifically create
the mailing list alias ieft-charsets@iana.org, it may also
make sense to create ftp.iana.org, or to just (or also) reference
www.iana.org.


> and all registered charsets will be listed in the periodically
> issued "Assigned Numbers" RFC [currently RFC-1700].  The
> description of the charset may also be published as an
> Informational RFC by sending it to "rfc-editor@isi.edu"
> (please follow the instructions to RFC authors [RFC-1543]).

It is not very clear here (and from another paragraph above)
whether if a charset wants to go for RFC, that RFC has to
exist first, or is created only after registration. What
makes probably most sense is that charsets that want to go
for RFC have to be submitted with an appropriate I-D, and
go to RFC when the registration is done.



> 6.  Registration Template
> 
>   To: ietf-charsets@iana.org
>   Subject: Registration of new charset XXX

May be good to have the name of the charset in the Subject line.

> 
>   Charset name(s):
> 
>   (All names must be suitable for use as the value of a
>   MIME content-type parameter.)
> 
>   Published specification(s):
> 
>   (A specification for the charset must be
>   openly available that accurately describes what
>   is being registered. If a charset is defined as
>   a composition of a CCS and a CES then these defintions
>   must either be included or referenced.)

Same comments as above with respect to "a CCS".
Maybe this is the best place to mention: If a publication of
the definition of the charset as an RFC is inteded, the
"published specification" must be a suitable internet draft.

> 7.  Security Considerations
> 
> This registration procedure is not known to raise any sort of
> security considerations that are appreciably different from
> those already existing in the protocols that employ registered
> charsets.

Is there a need for security considerations in the registrations
themselves? There was in UTF-8, but that may be an exception.


> [ISO-8859]
>      International Standard -- Information Processing -- 8-bit
>      Single-Byte Coded Graphic Character Sets
>      - Part 1: Latin Alphabet No. 1, ISO 8859-1:1987, 1st ed.
>      - Part 2: Latin Alphabet No. 2, ISO 8859-2:1987, 1st ed.
>      - Part 3: Latin Alphabet No. 3, ISO 8859-3:1988, 1st ed.
>      - Part 4: Latin Alphabet No. 4, ISO 8859-4:1988, 1st ed.
>      - Part 5: Latin/Cyrillic Alphabet, ISO 8859-5:1988, 1st
>      ed.
>      - Part 6: Latin/Arabic Alphabet, ISO 8859-6:1987, 1st ed.
>      - Part 7: Latin/Greek Alphabet, ISO 8859-7:1987, 1st ed.
>      - Part 8: Latin/Hebrew Alphabet, ISO 8859-8:1988, 1st ed.
>      - Part 9: Latin Alphabet No. 5, ISO/IEC 8859-9:1989, 1st
>      ed.
>      International Standard -- Information Technology -- 8-bit
>      Single-Byte Coded Graphic Character Sets
>      - Part 10: Latin Alphabet No. 6, ISO/IEC 8859-10:1992,
>      1st ed.

Revisions of some of these may be available.


>          Appendix A -- IANA and RFC Editor To-Do List
> 
> 
> 
> VERY IMPORTANT NOTE:  This appendix is intended to communicate
> various editorial and procedural tasks the IANA and the RFC

There is obviously only one task left at the moment.




Regards,	Martin.


--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Monday, 13 October 1997 09:48:20 UTC