W3C home > Mailing lists > Public > public-i18n-core@w3.org > April to June 2011

RE: Normalization recommendations wiki page...

From: Phillips, Addison <addison@lab126.com>
Date: Tue, 14 Jun 2011 18:33:24 -0700
To: Mark Davis ☕ <mark@macchiato.com>
CC: "public-i18n-core@w3.org" <public-i18n-core@w3.org>
Message-ID: <131F80DEA635F044946897AFDA9AC3476A93983ACA@EX-SEA31-D.ant.amazon.com>
Hi Mark,

Comments follow.

Addison

From: mark.edward.davis@gmail.com [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis ?
Sent: Tuesday, June 14, 2011 6:09 PM
To: Phillips, Addison
Cc: public-i18n-core@w3.org
Subject: Re: Normalization recommendations wiki page...

When I look at http://www.w3.org/International/wiki/CharmodNormSummary, all of the items in boxes are not linewrapped, forcing horizontal scrolling to read (a pain).

AP> Agreed. It’s a wiki “feature” that I just now cleaned up.

From a quick glance.

A "normalizing operation" is one whose results are normalization sensitive and which fully-normalizes the text on which it operates.

That doesn't make sense. If I fully normalize a string that I operate on, then I won't be normalization-sensitive.

AP> No, it doesn’t make complete sense. What I (thought I) needed was a placeholder to differentiate between “operations that are sensitive to normalization” and “specific cases of those operations on which we are actually requiring normalization”. CSS3 Selectors might make a good example here. The selectors are normalization-sensitive: if normalization isn’t done, different results are achieved than if they are. If selectors were to require normalization, then they would become a “normalizing operation” (0041 0301 is equal to 00C5). If they require that normalization not be done, they remain normalization-sensitive and 0041 0301 is not equal to 00C5.

I actually put a better description in the following paragraph:

--
The results of any such operation are dependent upon the code points encoded and visually and semantically identical strings might compare as unequal.
--

Also, the following looks bogus:

A text-processing component that receives suspect text MUST NOT perform any normalizing operations unless it has first either confirmed through inspection that the text is in normalized form or it has re-normalized the text itself.
There are plenty of circumstances where normalizing operations are performed. For example, the internals of collation require normalizing the text. So if I take a document, and present the fields in sorted order to a user, I'm breaking this recommendation.

AP> This paragraph looks bogus because it is bogus. The rewrite I applied makes it garbage. I did another recast, trying to get closer to dealing with the original requirement. Try instead:

--
Any text-processing component that is normalizing SHOULD normalize the text internally rather than modifying the original content. However, the results of each step in the operation MUST behave as if the original text had been normalized from the outset. Private agreements MAY be created within private systems which are not subject to these rules, but any externally observable results MUST be the same as if the rules had been obeyed.
--


Received on Wednesday, 15 June 2011 01:33:52 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Wednesday, 15 June 2011 01:33:54 GMT