W3C home > Mailing lists > Public > www-tag@w3.org > August 2011

ACTION-590: TAG work on Unicode normalization (was: Re: Any update on TAG request?)

From: Noah Mendelsohn <nrm@arcanedomain.com>
Date: Tue, 23 Aug 2011 14:09:44 -0400
Message-ID: <4E53ECE8.3000702@arcanedomain.com>
To: "Phillips, Addison" <addison@lab126.com>
CC: Larry Masinter <masinter@adobe.com>, "www-tag@w3.org" <www-tag@w3.org>, "member-i18n-core@w3.org" <member-i18n-core@w3.org>
Addison,

Please accept my apologies for not having responded to your request [1]
earlier.  I have read the email trail from the past few weeks. Please do
note in considering this response that I am not personally expert in
nuances of Unicode, normalization, or what the detailed concerns are with
respect to common practice and/or W3C or IETF specifications. That said...

Addison Phillips wrote:

> A TAG finding or statement would communicate this issue widely and
> establish a precedent for immediate-future decisions about
> normalization.

Earlier, Larry Masinter wrote:

> If there were some disagreement or opposing viewpoints or an issue where
> the TAG could act as a tie-breaker, that might be a different situation,
> but so far all I can see is some work that the Internationalization
> working group (of which you are chair) hasn't finished.

[...and from another email of Larry's...]

> I think the other input needed is a clear summary of what existing
> widely deployed implementations currently DO. I haven't found that,
> especially around normalization of input methods.

Stepping back from the technical details, of which there has also been
useful discussion, my question is: what role can or should the TAG play
here? Larry seems to have some familiarity with the technical details of
these issues, and I presume Peter Linss does as well, but I don't yet know
how many other TAG members do -- I suspect few if any.

With respect to schedules, the TAG will have one teleconference on 1 Sept,
devoted primarily to planning the agenda for our 13-15 September F2F. I
will put a brief agenda item on the 1 Sept. call to get a sense of the TAG
on how/whether to pursue this, and whether it's appropriate to schedule a
slot for it at the F2F.

In your original June email you wrote [1]:

> The current Internationalization Core WG has recently considered and
> discussed the problem of Unicode Normalization, in part because we feel
> the existing guidelines cannot be reasonably implemented by specs. We
> resolved to replace CharModNorm with a document that better describes
> the recommendations and best practices in this area and have started
> working on revised guidelines [2] based on our current WG consensus.

Speaking for myself, I don't see this as being at a level of refinement or
clarity that the TAG could formally endorse it, even if we had the
expertise to evaluate the detailed proposals being considered. I think
Larry made a similar point.

> Therefore, we would like to request that TAG schedule time in about four
> weeks to review I18N WG's proposed recommendations concerning Unicode
> Normalization. We would like to have some time so that we can prepare
> materials for TAG including the options available.

Given our schedule, I think what would be most helpful is if you could net
out in an email, preferably ahead of our 1 Sept. call:

* What exactly is the "I18N WG's proposed recommendation" that you would
like to review with us? Are you looking for a "blessing" to pursue a
general direction as signaled in the Wiki? Do you have something resembling
a draft Recommendation? As Larry points out, accompanying that with some 
analysis of what existing implementations do would be particularly helpful, 
or else perhaps you could describe your plans to gather that information in 
the future.

* What specific aspect(s) of the problem do you feel the TAG should be
helping with? Larry, quoted above, makes the case that the role for the TAG
at this point isn't clear, and I agree with him.

Having your response on those points in hand will likely lead to more 
efficient TAG discussion on Sept. 1st, and a good decision about whether to 
spend F2F time. We are meeting in Edinburgh 13-15 September, and I'm not 
sure exactly what dialin arrangements will be. Maybe or maybe not we could 
schedule an hour with you dialing in while we're there, or else any phone 
discussion would likely have to wait until our likely telcon on 22 Sept. In 
any case, the first step is to see whether the TAG as a whole finds 
something here that's appropriate for us to work on.

Thank you very much.

Noah Mendelsohn
Chair: W3C Technical Architecture Group (TAG)

[1] http://lists.w3.org/Archives/Public/www-tag/2011Jun/0188.html
[2] http://www.w3.org/International/wiki/CharmodNormSummary


On 8/19/2011 11:37 AM, Phillips, Addison wrote:
> Larry noted:
>
>> I guess this flew by ... Maybe I'm completely missing something?  I'm
>> not sure what the TAG is going to add to this, or why it should be a
>> TAG finding.
>
> In a nutshell, the I18N WG has historically had a position on Unicode
> Normalization in W3C specs and other Web technologies (embodied in
> Charmod-Norm). However, most specs and implementations have ignored
> this position. We have convinced various people that there is a problem,
> but not that they should take on the pain of being the first to address
> it.
>
> Recognizing that this historical position on normalization is untenable
> and in examining the alternatives, the WG still feels that
> normalization-affected languages would benefit from selective
> normalization during selection and string identity matching in W3C
> specs. The specs that concern us most (such as CSS3 Selectors) have
> potential ripple effects into other W3C specs (such as HTML5, DOM) and
> beyond (such as JavaScript).
>
> If we're to do this, t'were best done quickly and consistently. A TAG
> finding or statement would communicate this issue widely and establish
> a precedent for immediate-future decisions about normalization. The two
>  options that we feel are currently reasonable to consider are:
>
> 1. Do nothing. Do not require normalization by implementations or in
> specs. Create educational materials to help content authors understand
> the problem and try to avoid it.
>
> 2. Adopt our proposal for identifier and token/string matching
> normalization. Revise Charmod-Norm to embody this. Ensure that specs
> address these requirements in the future.
>
>>
>> http://lists.w3.org/Archives/Public/www-tag/2011Jun/0188.html
>>
>> contains: # In the meantime, documents such as CSS3 Selectors and
>> HTML5 depend on or could be impacted by Unicode Normalization.  In a
>> discussion with Peter Linss, co-chair of CSS, and others [4],
>>
>> But there is no reference [4].
>
> Typo. Reference [3] in that document is what was meant.
> http://www.w3.org/2011/06/17-cssns-minutes.html
>
> This was a meeting facilitated by PLH between CSS Namespaces and I18N.
> Attendees were Richard and I from I18N, Chirs Lilly, PLH, and Peter
> (co-chair of CSS)
>
>>
>> You are the internationalization experts, the TAG are not. Why isn't
>> the best course of action for the W3C I18N working group to finish
>> its document based on community consensus, including the CSS and HTML
>>  working groups?
>
> A summary of both the history of this issue and the proposal for the
> position of the I18N WG given above is here:
>
> http://www.w3.org/International/wiki/NormalizationProposal
>
> I think this might help clarify a bit.
>
> We are seeking to present the choices related to normalization to TAG
> now, garner a clear direction, and move everyone (that includes I18N)
> in that direction by fixing Charmod-Norm (and/or any affected specs) to
>  match what the Web does/will really do.
>
> HTH.
>
> Regards,
>
> Addison
>
> Addison Phillips Globalization Architect (Lab126) Chair (W3C I18N WG)
>
> Internationalization is not a feature. It is an architecture.
>
>
>
>
Received on Tuesday, 23 August 2011 18:10:14 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 26 April 2012 12:48:39 GMT