Re: Charset policy - Post Munich

=?iso-8859-1?Q?Martin_J=2E_D=FCrst?= (mduerst@ifi.unizh.ch)
Tue, 02 Sep 1997 00:47:44 +0200 (MET DST)


Date: Tue, 02 Sep 1997 00:47:44 +0200 (MET DST)
From: =?iso-8859-1?Q?Martin_J=2E_D=FCrst?= <mduerst@ifi.unizh.ch>
Subject: Re: Charset policy - Post Munich
In-reply-to: <3.0.1.32.19970831150228.00a8a2f0@genstar.alis.ca>
To: Francois Yergeau <yergeau@alis.com>
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <Pine.SUN.3.96.970902002702.12451K-100000@enoshima>

On Sun, 31 Aug 1997, Francois Yergeau wrote:

> À 13:15 29/08/97 +0200, Harald.T.Alvestrand@uninett.no a écrit :
> >Please check this for consistency with previous comments and comments
> >made in Munich.

> >    3.  Definition of Terms
> >
> >    This document uses the term "charset" to mean a set of rules for
> >    mapping from a sequence of octets to a sequence of characters,
> 
> Same as MIME, then.  Why not simply refer to MIME?

Because MIME still uses old terminology, even if it's the same
concept :-(.

> >    A "name" is an identifier such as a person's name, a hostname, a
> >    domainname, a filename or an E-mail address...
> 
> "...as used with some significance in a protocol".  My name and email
> address in the sig below are not names as discussed here, I guess, just
> part of a mail message body.

Yes and no. Software is nowadays pretty smart at extracting these.
I think "take" in pine, for examlpe, is great. And URLs on paper
also are names, because they may become significant in some protocol.


> >    3.2.  How to decide a charset
> 
> I think some language on default charsets is needed here.  Having seen the
> mess created by defaulting to Latin-1 in HTTP, I think a mandated default
> of UTF-8 everywhere, both in protocols items and contents, is warranted at
> the strong SHOULD level.

Good point. How about:

	Designation of a certain charset [other than UTF-8???] as a
	default that does not need to be labeled has turned out
	to be counter-productive and SHOULD be avoided if ever possible.


> >    4.5.  Default Language
> >
> >    When human-readable text must be presented in a context where the
> >    sender has no knowledge of the recipient's language preferences
> >    (such as login failures or E-mailed warnings, or prior to language
> >    negotiation), text SHOULD be presented in Default Language.
> >
> >    The Default Language is English, since this is the language which
> >    most people will be able to get adequate help in interpreting when
> >    working with computers.
> 
> I disagree with this for the following reasons:
> 
> 1) The justification is very weak.  There is no trace of a requirement for
> mandating  single Internet-wide Default Language.
> 
> 2) The spec as written prevents me (for instance) from using some other
> language X as default in an Intranet application, if I am bound by contract
> to obey Internet protocols on that Intranet; this holds even though I may
> know that all users of that Intranet do not understand English but speak X
> and/or can get adequate help in X.

This is a good point. I am sure it can be included in the text.


> 3) It asks every Joe User in the world to provide his Web home page in
> English, in case some client comes with no language preference settings.
> Same for all gopher pages, ftp archives etc., where negotiation is not even
> possible.

No, not at all. Already currently, and in particular if the text is
clarified in the sense that I have proposed in an earlier mail,
the only thing it says is that the default language must be used
in those cases where language negotiation was not yet possible.
So for example if a HTTP GET request doesn't contain an Accept-Language,
the client has forfeited his chance for language negotiation,
and you can serve whatever you please.

The only case in HTTP where Default English would come into play
is the case where the server detects a language header, but the
header is malformed and can't be parsed. It would then be
required to respond in English and say something like
"Malformed Accept-Language header, can't detect desired language".

Also of course in the case of ftp, file contents is not part of
the protocol, and can be whatever you want. For gopher, as long
as negotiation is not possible, this draft doesn't apply anyway.

Together with the restriction of your point 2) above, I think
this is rather reasonable.


> 4) History shows us that the dominant language changes over time; English
> is bound to go the way of Greek and Latin some day.

True. I have nothing againts adding a paragraph to that effect,
with an equivalent sentence about "decades" added :-).


> I'd rather see this whole section go away.

I understand you. But I think it is much better to clearly
say: "English as a default goes that little step, but not an
inch further" than to leave everything open and let protocols
come up with much more strong uses of English as a default
and with no easy way to reject them.

In this sense, I can also see that we explicitly add a paragraph
expressing that any defaults above and beyond those in that
section MUST be avoided.


Regards,	Martin.


--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)