RE: Charset policy - Post Munich from Martin J. Dürst on 1997-10-10 (ietf-charsets@w3.org from October to December 1997)

From: Martin J. Dürst <mduerst@ifi.unizh.ch>
Date: Fri, 10 Oct 1997 21:41:59 +0100 (MET)
To: Ned Freed <Ned.Freed@INNOSOFT.COM>
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <Pine.SUN.3.96.971010204151.7026b-100000@enoshima>
On Fri, 10 Oct 1997, Ned Freed wrote:

> Two additional points need to be made here:
> 
> (1) MLSF is no longer on the table as a proposal at the present time. This
>     may change, however, if the UTC and ISO do not deliver on embedded language
>     tags.

I seriously hope that those involved in the UTC and ISO can "deliver",
so that I don't have to read statement like these again. And I
seriously hope that they deliver in a way that will make it
clear that these embedded language tags are ment for special
occasions, and are not ment e.g. to pollute out of those
restricted areas they are officially allowed.

But I continue to wonder: If you think that a character set has to
have embedded language tags, why for example don't you first go to
your national standards body (ANSI) and help them to get language tags
into ASCII, to be able to distinguish English and Hawaiian, at least?
Or why don't you propose to include language tags in the ISO-8859-X series,
where each part of the series is used for a great multitude of languages?
What's so special about UCS and UTF-8 that the international standard
bodies suddenly have to "deliver"? What makes you, as an individual,
so special that you think you are entitled to ask them to "deliver".
Or do you think you could imagine the IESG, as the representative
of the IETF, to formally send a note, in the style as above, or as
some MLSF proponents have expressed themselves to UTC members, to
ISO and UTC?


> (2) If MLSF needs to be revived I for one will support it.

You are of course free to support whatever you think you want to
support. But I would really like at that point to hear better
technical arguments from you and the other MLSF supporters than
what I have heard up to now. The only technical argument in
favor of MLSF that I have heard is that it is good for internal
processing in some cases. I agree with that, but we all know that
this is largely irrelevant for the IETF.

And the technical problems with MLSF, as they have been discussed,
won't go away just because there is no "delivery". Just to make
sure, these technical problems have nothing to do with whether
inline language tags are desirable or not. Even if I were the
strongest supporter of inline language tags (which I admit I'm
not), I wouldn't want to ruin UTF-8 with MLSF.


>     As such, I have
>     no intention of putting something in a document that would prevent this
>     from happening unless and until I am specifically directed to do so by
>     either an AD or a WG chair with jurisdiction over the charset registration
>     specification.

We are currently not discussing the charset registration specification.
We are discussing Harald's policy document. And that's for Harald to
decide.


> > MLSF, in particular the way it was presented and defended by it's
> > proponent(s) on the unicore list, is (or hopefully *was*) completely
> > different. First, it is pure individual character tagging. A tag
> > can be put in at any place whatsoever.
> 
> Actually it is nothing of the sort. It is a tagging mechanism that can be used
> arbitrary character sequences, including but not limited to those of length 1.
> It is not limited to tagging individual characters, it doesn't make sense
> to deliberately use it this way,

True. I only said that a tag can be put in any place whatsoever, I didn't
say it would have to come before every character.


> and I cannot recall any discussion where
> someone advocated this sort of usage. Allowing for it as a natural consequence
> of combining strings in different languages, perhaps, but not intentional
> use.

In the context of names written with ideographs, which was mentionned
in the discussion, and to which people referred by pointing to
corresponding sections in RFC 2130, that usage was very clearly implied.


> > Second, it ruined the clean
> > properties of UTF-8, risking to blow up a lot of converters in
> > unpredictable ways.
> 
> Even supposing I agree, which I don't, please explain why this has any
> relevance whatsoever to the matter at hand. This is an issue with the overall
> design of MLSF, not with its ability to insert language tags at a given level
> of granularity. You could use MLSF for other sorts of tagging and this
> would not change.

Well, you could for example use MLSF in ACAP to code the date of an entry,
or the access priviledges, of course. But curiously enough, that wasn't
done. As far as I seem to remember, ACAP uses metadata for the date
of an entry, and special field names for access priviledges. But please
correct me if I'm wrong. If things like the above where possible, why
was it supposedly impossible to do one of these for language information?
The mechanisms for metadata need to be there anyway.


> > Third, the design constraints of ACAP didn't
> > necessitate at all.
> 
> Again, even supposing I agree, which I don't, please explain why this has any
> relevance whatsoever to the matter at hand.

ACAP supposedly needed MLSF, or inline language tags. And some
people in that debate specifically mentionned that they thought
that ACAP needed to be able to tag each word, at least, because
they thought that RFC 2130 said so.

You asked why I think that just saying "all protocols have to
use language tags" (or some such) may be insufficient, and ACAP
and MLSF are the specific examples that make me think so. So
that's why ACAP and MLSF are relevant here.


> > It was a clear strawman. I remember naively
> > proposing alternatives such as using metainformation for language,
> > which was rejected on I don't remember what supposedly technical
> > reasons.
> 
> All you are doing here is demonstrating that ... you ...
> don't understand the design constraints ACAP has to deal with.
[for the ..., see a separate mail]

Then, if you think you understand them better, please explain to
all the readers of this group why ACAP has metainformation and so
on, but can't use that mechanism for language information.


> > When I subscribed to the ACAP WG mailing list and had a
> > look at the archives, I easily found mails discussing language as
> > metainformation. But to the outside, this was withheld and denied,
> > because some people felt that they *just needed* individual character
> > tagging, but were not ready to discuss this technically in true
> > IETF manner, but rather preferred to claim that they represented
> > the IETF because they correctly assumed that most of their counterparts
> > didn't have much of a clue about IETF process.
> 
> This may be your assessment of what happened. However, I was involved in all
> this, and my assessment of your assessment is that it is entirely specious
> and without merit.

The "felt" and the "because" parts are my own interpretation, but the
mails, the withholding, the refusal of technical discussions, the
claims to represent the IETF, and the ignorance on the side of the UTC
in terms of IETF process are all well documented.
It may well be that in the face-to-face meeting after that discussion,
more technical arguments were presented, and this was done in a more
open manner, but as you know, I was too far away to be able to
participate in that meeting.


> > Fred - I don't mind having individual character tagging where it makes
> > sense. It is possible with <SPAN> and LANG in HTML. It may make a lot
> > of sense in other places. But I have been seriously burnt by claims,
> > wrongly based on the IAB report [RFC 2070], that all text in internet
> > protocols needs individual character tagging, and by upfront pseudo-
> > arguments claiming technical necessities where there existed alternatives.
> 
> How exactly have you "been burned" by such claims?

Well, that was obviously not ment physically.


> > That has happened, and if I worry about it happening again, that's
> > not a strawman at all.
> 
> Since you have yet to offer a counterexample that meets my criteria (which I
> believe were entirely fair and even generous), I continue to label this as a
> strawman. In fact any and all discussion of MLSF is by definition a strawman at
> this time since the proposal is no longer even on the table!

You cannot on one side claim that MLSF is no longer on the table, and
on the other hand admit that you keep it just hidden below the table to
put it on the table at the first occasion that you seem fit.


> The entire purpose of the charset registration document is to specify rules. 

Again, wrong document. If MLSF had a reason for existence (to which I strongly
disagree), I would definitely prefer to have it labeled as a "charset" or
otherwise than to have it float around claiming to be UTF-8.


Regards,	Martin.


--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)
Received on Sunday, 12 October 1997 18:04:58 UTC