Re: [Encoding] false statement [I18N-ACTION-328][I18N-ISSUE-374] from Andrew Cunningham on 2014-09-01 (www-international@w3.org from July to September 2014)

From: Andrew Cunningham <lang.support@gmail.com>
Date: Mon, 1 Sep 2014 10:49:07 +1000
To: John C Klensin <john+w3c@jck.com>
Cc: Addison Phillips <addison@lab126.com>, Anne van Kesteren <annevk@annevk.nl>, Richard Ishida <ishida@w3.org>, wwwintl <www-international@w3.org>, Larry Masinter <masinter@adobe.com>
Message-ID: <CAGJ7U-XUz8LVohEKqAkYAp8C7OdQjO9-rxJ9QCOHwOYyLoqjuw@mail.gmail.com>
Hi John,

I understand your position. And in various parts agree with it.

With respect to Unicode. A lot of work has been done. And it takes time.
Within the last week I have been coming across unencoded Arabic characters
I haven't seen before.

It takes time, resources and funding to analyze and document things even
before you get to the stage of preparing proposals.

But I think that the way forward has always been Unicode, in some cases its
the only game in town.

But I think the crux of the problem is web browser support. It is all well
and fine for the encoding spec to require UTF_8, but that in itself is a
pointless effort unless there is a requirement to have adequate Unicode
support in the browsers.

There are lots of things browser developers need to do better, or get right
(in some cases). I would argue that NO web browser adequately supports
minority languages. Some of those things involve enhanced unicode support,
some involve the flawed approaches to font fallback, some involve
implementing parts of CSS3 that are critically needed.

Support for minority and lesser used languages on the web is a complex
situation.

The reality is that user communities what to use their languages, if they
develop hack solutions to achieve it, then that maybe because mroe work
needs to be done implementing it in Unicode, but many of the cases i can
think of ffor wide spread hacks isn't lack of support in teh Unicode
standard, its more the lack of support in web browsers for a large rage of
reasons, some that are the responsibility of the browser developers, some
of the OS developers.

But just mandating UTF-8 isn't the answer to the encoding problems. It is a
multifaceted issue.

Andrew



On 01/09/2014 9:31 AM, "John C Klensin" <john+w3c@jck.com> wrote:

> Andrew (and, by the way, John Cowan),
>
> I certainly did not intend to be either brutal or elitist.  I'm
> trying to separate what seem to me to be multiple problems in
> the hope that it will help us move forward.  Over the history of
> the Internet (and a few other technologies with which I've had
> to work), institutionalizing incompatibility has rarely turned
> out to be a good idea.  Sometimes it happens and we have to work
> around it, sometimes those workarounds are successful, but even
> that rarely changes the "bad idea" part.  If I had a script that
> wasn't supported by Unicode, I'd be unlikely to write a proposal
> to get it coded and then sit around waiting for years waiting
> for them to do it.  However, I would write the proposal and,
> when I created an interim system, I'd try to make sure there was
> a migration plan and, ideally, that my interim system didn't
> conflict with anyone else's.
>
> I think we have some historically-established ways of doing that
> which we know how to handle.  I'd hate to see us go back to 2022
> and expand that registry but I can also imagine its being an
> interesting (and non-conflicting) solution while waiting for
> Unicode and, if ISO/IEC JTC1/SC2 isn't willing to maintain and
> update that registry, I can imagine several entities who could
> take over.  If the Unicode Consortium understands and is
> convinced that this has become a serious problem, perhaps they
> could start conditionally reserving some blocks for
> as-yet-uncoded scripts so at least there could be unambiguous
> migration paths, perhaps via a new subspecial of compatibility
> mappings or providing surrogate-like escapes to other code
> points that would parallel the 2022 system.    There may also be
> better ideas, but I wish you (and others) would propose them
> rather than --it seems to me-- merely complaining in louder and
> louder voices (or, in this case, name-calling).
>
> Any of those approaches (at least the ones I can think of) would
> be very ugly, but far preferable to disguising a lot of one-off
> font tricks or pseudo-Unicode, with potentially overlapping code
> points, as Standard UTF-8 and hoping that the end systems can
> sort out what is going on without any in-stream clues.  That
> just leads to a very fragmented environment in which people
> cannot communicate... or worse.
>
> If the official Unicode Consortium position were really "people
> should just wait to use their languages until we get around to
> assigning code points and we reserve the right to take as many
> years as we like" and the official WHATWG (much less W3C)
> position were really "if your language and script don't have
> officially assigned Unicode code points, you don't get to be on
> the web" then it is probably time for the broader community to
> do something about those groups.  Fortunately I haven't heard
> anyone who can reasonably claim to speak for any of those bodies
> say anything like that.  If you have, references would be
> welcome.
>
> More or less the same situation applies to the Encoding spec.
> It still seems to me that it should be targeting UTF-8 and
> Standard Unicode with other things viewed as transitional.  That
> doesn't solve your "pseudo-Unicode" problem, but, AFAICT, it
> doesn't make it any worse either.   As I have tried to say
> before, I (at least) would be interested in what you do propose
> but, so far, you just seem to be complaining about things that
> won't work in the contexts you are concerned about.  Certainly
> the web browsers and other software who are now supporting font
> or [other?] pseudo-Unicode tricks aren't going to stop doing so
> because Anne, WHATWG, or W3C say those tricks are bad --
> everyone knows they are bad already, even (or especially) those
> who think they are necessary and those who think they are
> necessary are (correctly, IMO) unlikely to change their minds
> until after someone offers real alternatives.
>
> Finally (at least for today) there is a choice in principle
> between saying "the browser vendors and page authors who are
> using IANA Registry Charset labels but doing something else are
> causing interoperability problems with the rest of the Internet
> and the rest of the world and should be designing ways to get
> out of that hole" and saying "many of the browser vendors are
> doing this and, while it differs from what the IANA Registry is
> usually believed to specify, it is the standard because they are
> doing it and therefore everyone else should get in line".  The
> first may be impractical (and probably is unless higher powers
> intervene).   The second (or variants on it) would be a whole
> lot more attractive if the community could feel some assurance
> that we wouldn't have to look forward to another round of the
> same thing in the future, e.g., an "Encoding 2018" spec that
> said "don't pay any attention to the labels and definitions
> established in Encoding 2014 because the browser vendors went
> off in another direction".  One possible implication of your
> comments is that the risk of that situation is pretty high; if
> it is, then we really ought to be discussing a better solution
> to it than either making proclamations that will be ignored or
> engaging in fervent prayer that the light coming toward us
> really isn't a train after all.
>
> best,
>    john
>
>
> --On Monday, 01 September, 2014 08:22 +1000 Andrew Cunningham
> <lang.support@gmail.com> wrote:
>
> > Anne and John
> >
> > Your comments read as brutal and elitist.
> >
> > Do you have any idea of how long it takes to prepare and
> > shepard through a Unicode proposal? How much work and
> > resources it can take?
> >
> > The communities that need Unicode support don't necessarily
> > have the resources or expertise to prepare the proposals.
> >
> > Recently a proposal went to UTC to disunify some charactrrs in
> > the Myanmar block. The proposal was rightly rejected.
> >
> > I had a chat to one of the authors of that proposal. What was
> > interesting was the reason for preparing the proposal in the
> > first place.
> >
> > Essentially the problem was web browsers were precieved to
> > have problems with displaying content in the languages in
> > question.
> >
> > Essentially they were trying to get changes in unicode because
> > of deficiences in web browsers.
> >
> > Most cases I know for use of what you refer to as hacks did
> > not occur specifically because of lack of support of language
> > in Unicode. It came as a specific consequence of lack of
> > support in web browsers.
> >
> > Lets be honest here. It is easier to get unicode to add
> > support than it is to get web browsers to add support.
>
>
>
>
>
Received on Monday, 1 September 2014 00:49:34 UTC