W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2011

Re: Unicode Normalization: request for TAG discussion and a finding

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Thu, 30 Jun 2011 22:54:46 +0200
To: "Phillips, Addison" <addison@lab126.com>
Cc: "www-tag@w3.org" <www-tag@w3.org>, "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <rlip075sfqut14ip35nk7u6sle88658pql@hive.bjoern.hoehrmann.de>
* Phillips, Addison wrote:
>I agree that user stories are important, although the languages most
>affected are also the ones that it is hardest for us to gather evidence
>from. It is also true that people have "gotten around" things not
>working by using ASCII-only identifiers, for example. The advent of more
>active or actively-generated content and better Unicode support, though,
>expose us to more problems in this area.

More Unicode means some old problems go away and some new problems come
up, I am not sure whether that makes for "more" problems in total. Here
for instance we have some legacy character encodings that interfere with
normalization, but they also being phased out, and with them likely the
normalization problems aswell.

Anyway, I care less about the stories and more about the users. Look at
it this way: if we don't have users who can actually judge whether some
proposal would make matters better for them, how would we know if the
proposal actually makes matters better for anyone? For all we'd know it
may make matters worse because the affected users have found a better
solution that they didn't tell us about, and our solution breaks theirs.

As for workarounds, well, I've been using E-Mail since the late 1990s
and even today I can't put my name in the From header because people are
still using software that would mangle my name. On the web this has not
improved much either, I still get snail mail with my name mangled, and
just this month I signed up for a web forum only to find out they, for
whatever reason, "asciify" my name in various places, but "ö" is turned
into "o" everywhere, and the only way to get my name properly asciified
is, just as I do in E-Mail, by using "oe" myself.

And there is the input problem, if I use "ö" in place of "oe", some
people can't actually type in my name on their keyboard, and they too
are unlikely to know the proper transliteration, so by not using the
properly asciified spelling, I am making it hard to communicate with me.
So at this point I am unsure we'll ever get to a point where I can com-
fortably spell my name properly, even if just for that reason. Actually
it's even worse, consider some write my name as "Bjoem" for instance.

Point being, how would I know that whatever normalization problems the
people in absentia are having, aren't much like the far simpler problem
I am having (in my case the spelling differences are at least obvious)?

What I would like to avoid is people raising Normalization issues on
protocols without being able to convince whoever is working on them that
there is a problem, that it needs fixing, and that there is a fix that
is known to work. All this requires evidence which seems to be lacking
at the moment, and I don't think the TAG can help there. Even as little
as a list of the most affected people would help a lot to change this.
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/ 
Received on Thursday, 30 June 2011 20:55:12 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 30 June 2011 20:55:14 GMT