Re: Charset policy - Post Munich

=?iso-8859-1?Q?Martin_J=2E_D=FCrst?= (
Fri, 10 Oct 1997 16:55:57 +0100 (MET)

Date: Fri, 10 Oct 1997 16:55:57 +0100 (MET)
From: =?iso-8859-1?Q?Martin_J=2E_D=FCrst?= <>
Subject: RE: Charset policy - Post Munich
To: Ned Freed <Ned.Freed@INNOSOFT.COM>
Cc: ietf-charsets@INNOSOFT.COM
Message-id: <Pine.SUN.3.96.971010154636.7026P-100000@enoshima>

On Sun, 5 Oct 1997, Ned Freed wrote:

> > > > Specifically, I also agree that language tags are a big help
> > > > to current stupid machines. But if we put an absolute requirement
> > > > for language tags into our policy, a requirement that in the
> > > > extreme might say: "Every protocol has to be able to language
> > > > tag all the characters it sends around, with potentially
> > > > different tags for each character.",
> > >
> > > Martin, this is nothing but a strawman and you know it.
> > NO! If it were just a strawman, I wouldn't have any reason
> > to mention it. The above sentence is my quintessencial
> > summary of a position that has quite recently been used
> > in discussions you have participated in. The above is far
> > from being a strawman, and you know it.
> Fair enough: Prove it. Please cite the protocol design being undertaken at this
> time where there's a proposal on the table with the specific goal of allowing
> and encouraging individual character tagging.
> Please note that designs like MIME encoded-words and MLSF which do allow
> individual character tagging, do NOT qualify. The fact that these designs allow
> individual character tagging (albeit in a very painful and artificial way) is a
> artefact of the design constraints these things operate under.

You guessed in the right direction, obviously. But I see a big difference
between MIME encoded words (as extended in the pvcsc spec) and MLSF.

On the technical side, PVCSC does not apply to characters, it applies
to encoded words. Encoded words have to be separated by linear white
space (which is not removed when decoding the encoded words, as far
as I understand), and can only have one language.  The same for the
language specification for parameters defined in PVCSC, it is
one language per parameter, which is not individual character tagging.
Also, the header syntax is indeed clumsy, but encoded words without
language tags are already clumsy enough, and the added clumsiness
is not really much. So it's word-based tagging, somewhat painful
but not really a big deal, and designed to the constraints of MIME
and email.

MLSF, in particular the way it was presented and defended by it's
proponent(s) on the unicore list, is (or hopefully *was*) completely
different. First, it is pure individual character tagging. A tag
can be put in at any place whatsoever. Second, it ruined the clean
properties of UTF-8, risking to blow up a lot of converters in
unpredictable ways. Third, the design constraints of ACAP didn't
necessitate at all. It was a clear strawman. I remember naively
proposing alternatives such as using metainformation for language,
which was rejected on I don't remember what supposedly technical
reasons. When I subscribed to the ACAP WG mailing list and had a
look at the archives, I easily found mails discussing language as
metainformation. But to the outside, this was withheld and denied,
because some people felt that they *just needed* individual character
tagging, but were not ready to discuss this technically in true
IETF manner, but rather preferred to claim that they represented
the IETF because they correctly assumed that most of their counterparts
didn't have much of a clue about IETF process.

Fred - I don't mind having individual character tagging where it makes
sense. It is possible with <SPAN> and LANG in HTML. It may make a lot
of sense in other places. But I have been seriously burnt by claims,
wrongly based on the IAB report [RFC 2070], that all text in internet
protocols needs individual character tagging, and by upfront pseudo-
arguments claiming technical necessities where there existed alternatives.

That has happened, and if I worry about it happening again, that's
not a strawman at all.

> It is NOT an
> explicit goal of the design in either case, and as such these designs would not
> be invalidated in any way by any rule we make here. The most that would happen
> is that they would have to adopt some very complex and confusing profiling
> rules that will almost certainly cause more problems than they solve, which is
> one of the reasons why I don't want this door opened to begin with.

True for PVCSC, obviously not true for MLSF and ACAP.

> > > This also presupposes a level of cluelessness on the part of WG chairs, area
> > > directors and directorates, the IESG, and the IAB that I almost find offensive.
> > > We do have checks and balances in this process, you know.
> > I know. And I had absolutely no intent to offend any of the people or
> > functions you have mentionned above. But making explicit that language
> > tagging is something one should think about, and try to find a solution
> > that is well adapted to the protocol, will be a great help in avoiding
> > discussions as they have taken place, in which experienced protocol
> > designers have refused to do serious technical discussions because
> > they thought they had some document to back up their claims (which
> > they didn't), and the truth on their side anyway, and claimed to
> > do this in the name of the IETF.
> You are confusing the issue of whether or not language tags need to be present
> at all with the issue of what level of granularity at which they need to be
> present. These are very different things. When people refuse to discuss these
> matters with you it is because you persist in dragging in the "should they be
> there" consideration and they regard that issue as settled. You may not like it
> and frankly most of the time they may not either, but the requirement is a done
> deal informally and is about to become a done deal formally. And thus people
> aren't interested in listening to you go on and on about how language can be
> deduced from content and thus eliminate the need for tagging entirely. They are
> instead interesting in building protocols that will be able to  make it onto
> the standards track.

I understand that. But what you and others are confusing is that
first of all, I'm not against language tagging. Otherwise, I wouldn't
have coauthored RFC 2070, and I wouldn't have spent quite a bit of my
time this year helping to bring all of RFC 2070 into HTML 4.0.

When I ask people in discussions about why language tagging is needed,
the main aim is to try to help them make sure they understand what
language means in their protocol (e.g. whether and where they need
negotiation or alternatives,...) and how they can best handle it.

When I sometimes argue against language tagging in a particular variety
or for a particular application, it's because I want to help people
understand the benefits and limits of language tagging.

I'm not confusing the question of whether we need language tagging at
all with the question of granularity. But I know there is a danger that
other people make this confusion, on purpose or inadvertedly, and I
would like to make sure it doesn't happen.

Also, I think that the primary interest of the IETF and all its
participant is in making good protocols, and to design the standards
track process so that we can make pretty sure good protocols make it,
and protocols with problems don't. There is no use in a protocol
to just plug in an arbitrary security mechanism in without them
thinking about why they need security and what kind of security
they may need and how they make sure that works together with the
rest of the protocol. A WG that just reads the IETF security
requirements, follows them to the word, but in fact thinks
"what the hell with security, but gee, we have to give them
something so that we go standards track" is not really what
we want. And the same applies to internationalization and
multilingual issues.

> And this is why I don't want to make rules about the level of granularity that
> has to be provided: It presupposes not only that protocol designers and the
> IESG are incompetent to decide these things themselves, it also presupposes
> that we can at this time know all the constraints designers will be operating
> under.

I never proposed to make *rules* about granularity. The only thing
I proposed was to mention granularity as such, as one (important)
aspect of language tagging, to help designers get aware of the
fact that this is an issue, and to avoid claims by people that
would like to see it like this that "language tagging means
you have to be able to tag each single character".

> > Anyway, if you an others assure me that my concerns are not
> > (or not anymore) justified, I am ready to accept that.
> > I sincerely hope I don't have to come back to it later.
> I see no evidence that supports a view that you will have to.

Many thanks for your optimism.

Regards,	Martin.

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)