Re: Charset policy - Post Munich

Ned Freed (Ned.Freed@INNOSOFT.COM)
Sun, 12 Oct 1997 15:59:40 -0700 (PDT)


Date: Sun, 12 Oct 1997 15:59:40 -0700 (PDT)
From: Ned Freed <Ned.Freed@INNOSOFT.COM>
Subject: RE: Charset policy - Post Munich
In-reply-to: "Your message dated Fri, 10 Oct 1997 21:41:59 +0100 (MET)"
To: =?iso-8859-1?Q?Martin_J=2E_D=FCrst?= <mduerst@ifi.unizh.ch>
Cc: Ned Freed <Ned.Freed@INNOSOFT.COM>, ietf-charsets@INNOSOFT.COM
Message-id: <01IOQ71FRDWC9JD4JC@INNOSOFT.COM>

> On Fri, 10 Oct 1997, Ned Freed wrote:

> > Two additional points need to be made here:

> > (1) MLSF is no longer on the table as a proposal at the present time. This
> >     may change, however, if the UTC and ISO do not deliver on embedded language
> >     tags.

> I seriously hope that those involved in the UTC and ISO can "deliver",
> so that I don't have to read statement like these again. And I
> seriously hope that they deliver in a way that will make it
> clear that these embedded language tags are ment for special
> occasions, and are not ment e.g. to pollute out of those
> restricted areas they are officially allowed.

> But I continue to wonder: If you think that a character set has to
> have embedded language tags, why for example don't you first go to
> your national standards body (ANSI) and help them to get language tags
> into ASCII, to be able to distinguish English and Hawaiian, at least?

Simple: Because we're not proposing ASCII be available for use in situations
where no other language tagging facility is available. We are proposing
precisely this sort of use for UTF-8, however -- we're proposing making
UTF-8 and/or UCS our One True Charset, and that carries with out some
additional responsibilities.

> Or why don't you propose to include language tags in the ISO-8859-X series,
> where each part of the series is used for a great multitude of languages?

Same answer. It situations where these charsets are allowed for us we've
already taken steps to add language tagging facilities.

In other words, I do think that language tagging is an issue for these charsets
as well, it is just that situations where multiple charsets are allowed
aren't the same as where UTF-8 or UCS is being used without any tagging
facilities being present.

> What's so special about UCS and UTF-8 that the international standard
> bodies suddenly have to "deliver"?

What is special is the notion that these are truly universal charsets that can
be used in places without any other charsets or tagging being necessary or even
allowed.

This makes them completely unique in my book (and very welcome), and it means
that they end up with three obligations that no other charset has:

(1) They have to be willing to accomodate the addition of whatever characters
    are needed to support additional languages or scripts.

(2) They cannot change extant character assignments.

(3) They have to allow for whatever sorts of inline tagging are necessary
    for rendition.

> What makes you, as an individual,
> so special that you think you are entitled to ask them to "deliver".

There is nothing special about me as an individual. However, as a protocol
designer, I feel that I have a right to operate under rules that let me get my
work done. And for that to be possible we have three choices:

(1) Rescind the language tagging requirement the IAB has specified.
(2) Devise a facility of our own for inline tagging for the situations where
    out of band tagging is either unavailable or unsufficient. This is MLSF.
(3) Ask the UTC and ISO to deliver a facility.

My assessment is that there is no chance of (1) happening. The consensus to use
UTF-8/UCS is predicated on this being done, and the minute you move away from
it you lose the consensus as well. So (2) was proposed for situations where
inline tagging is the obvious answer, and most of the people in the protocol
community pretty much liked it. (And some even loved it.) The UTC/ISO people
hated it, however, but they aren't the ones who have to deliver workable
protocols, so their arguments as to its functionality mismatch at the charset
level were not and are not very effective. So they instead proposed (3), and
most of us said, "That's fine as long as you actually deliver it, if not, we'll
go back to (2) or something with similar capabilities". 

> Or do you think you could imagine the IESG, as the representative
> of the IETF, to formally send a note, in the style as above, or as
> some MLSF proponents have expressed themselves to UTC members, to
> ISO and UTC?

This is pretty much what has already been done.

> > (2) If MLSF needs to be revived I for one will support it.

> You are of course free to support whatever you think you want to
> support. But I would really like at that point to hear better
> technical arguments from you and the other MLSF supporters than
> what I have heard up to now. The only technical argument in
> favor of MLSF that I have heard is that it is good for internal
> processing in some cases. I agree with that, but we all know that
> this is largely irrelevant for the IETF.

First of all, I am not a MLSF supporter. I am a supporter of tagging codepoints
being added to ISO 10646. If that fails then I will review my options and will
probably become an MLSF supporter, unless of course someone proposes something
better than MLSF.

So what you have here is yet another strawman, since nobody here is arguing in
favor of MLSF, least of all me. All I'm saying is that I strong object to
trying to  close the door on schemes like MLSF at this time.

> And the technical problems with MLSF, as they have been discussed,
> won't go away just because there is no "delivery". Just to make
> sure, these technical problems have nothing to do with whether
> inline language tags are desirable or not. Even if I were the
> strongest supporter of inline language tags (which I admit I'm
> not), I wouldn't want to ruin UTF-8 with MLSF.

Martin, the issue here is precisely whether or not inline tagging is ever
necessary. As far as MLSF goes I don't have the time, need, or interest in
further discussion of a proposal that isn't even on the table right now and if
all goes will never will be in the future. Should the UTC/ISO fail to
get tagging codepoints into ISO 10646 I'm sure MLSF will be revived. But I'm
also sure that other alternatives will be proposed, and then they will
all be evaluated on their technical merits.

> >     As such, I have
> >     no intention of putting something in a document that would prevent this
> >     from happening unless and until I am specifically directed to do so by
> >     either an AD or a WG chair with jurisdiction over the charset registration
> >     specification.

> We are currently not discussing the charset registration specification.
> We are discussing Harald's policy document. And that's for Harald to
> decide.

True enough. However, I think the points I made in regard to the registry
document also apply to Harald's document.

> > Even supposing I agree, which I don't, please explain why this has any
> > relevance whatsoever to the matter at hand. This is an issue with the overall
> > design of MLSF, not with its ability to insert language tags at a given level
> > of granularity. You could use MLSF for other sorts of tagging and this
> > would not change.

> Well, you could for example use MLSF in ACAP to code the date of an entry,
> or the access priviledges, of course. But curiously enough, that wasn't
> done. As far as I seem to remember, ACAP uses metadata for the date
> of an entry, and special field names for access priviledges. But please
> correct me if I'm wrong. If things like the above where possible, why
> was it supposedly impossible to do one of these for language information?

Martin, this is not relevant and you know it. Even supposing that you're
correct in saying that ACAP metadata could be used for language tags, all that
would mean is that in ACAP there is no need for fine granularity language tags.
But ACAP is just one protocol, and the guidelines we're trying to come up with
here are intended to be completely general. Unless and until you're prepared to
say that tags with fine granularity are never necessary (and we already have a
counter-example in encoded-words) we should not include the sort of language
you're talking about.

Now, even though your ACAP issue isn't relevant, I will answer it: The basic
problem in ACAP is that metadata isn't of fine enough granularity to be used
for language tagging. There are problems with multi-valued attributes, for
example. So, given the present protocol specification, it really isn't possible
to add metadata for this and have a workable result. And while the protocol
could of course be changed to "fix" this, if you sit down and work out all the
consequences (I haven't done so, but I've watched it being done) you find that
it screws up the protocol tremendously and makes things vastly more complex.
When compared with the simplicity of inline tagging it just doesn't measure up.

In other words, this is a protocol design issue where it seems to the protocol
designers that the right answer is finer level of tagging granularity than what
you'd like to see them pick. But even if the ACAP designers are completely
wrong about this it would not you're correct, since for you to be right you'd
have to be right about every protocol.

> > > Fred - I don't mind having individual character tagging where it makes
> > > sense. It is possible with <SPAN> and LANG in HTML. It may make a lot
> > > of sense in other places. But I have been seriously burnt by claims,
> > > wrongly based on the IAB report [RFC 2070], that all text in internet
> > > protocols needs individual character tagging, and by upfront pseudo-
> > > arguments claiming technical necessities where there existed alternatives.
> >
> > How exactly have you "been burned" by such claims?

> Well, that was obviously not ment physically.

Nor did I mean it in a physical sense. I was simply asking you to back up
your assertion that you have been harmed in some way, tangible or not, by
such claims.

> > Since you have yet to offer a counterexample that meets my criteria (which I
> > believe were entirely fair and even generous), I continue to label this as a
> > strawman. In fact any and all discussion of MLSF is by definition a strawman at
> > this time since the proposal is no longer even on the table!

> You cannot on one side claim that MLSF is no longer on the table, and
> on the other hand admit that you keep it just hidden below the table to
> put it on the table at the first occasion that you seem fit.

Not on the first occasion where I see fit. I have stated publicly that I will
only support MLSF should the necessary facilities not appear in ISO 10646.

But the occasion for reintroduction aside, this is in fact _exactly_ what I'm
doing and I see nothing wrong with doing it. I'm not attempting to prevent MLSF
from receiving appropriate technical review should the occasion arise to revive
it. I'm not attempting to block other alternatives to MLSF. In fact this comes
to pass and an alternative to MLSF is presented that is superior I will support
it in favor of MLSF.

The point is that we have a solution on the table -- embedded tagging 
codepoints in ISO 10646 -- that it seems all of us can live with. But it isn't
a sure thing yet. So we're keeping some alternatives in reserve should the
current solution that's on the table run afoul of some UTC or ISO glitch. It
is simply common sense for us to do so, nothing more and nothing less.

You, on the other hand, want us basically to either ban or make it difficult to
pursue other alternatives even before we know that we have a solution in place.
We're being asked for advance payment prior to possible receipt.

Basically this boils down to a matter of trust. Should the IETF trust the UTC
and ISO to deliver on their promise and in the meantime in part give up some of
their ability to fix things should something go wrong? Should the UTC
and ISO trust the IETF to use what they've built and not build something else
that's unnecessary?

In my opinion the IETF has proved itself worthy of such trust, not once but
over and over and over again. This is why I continue to ask if you have a
counterexample in the form of something that's either been approved by
the IETF or is actually on the table now in the IETF that would demonstrate
otherwise. And it seem fairly clear now that you have no such example to
present.

And frankly, I think things like the Hangul mess amply demonstrate why the IETF
would be abrograting its responsibility of being able to live up to its
obligation to be able to build protocols that meet its own rules by adding the
sort of text you suggest at this time. Maybe in a year or two, when the
tagging codepoints are set in stone and we have substantially more experience
with their use in various protocols. But not now.

				Ned

--Boundary (ID uEbHHWxWEwCKT9wM3evJ5w)