RE: New Working Group Note: Requirements for String Identity Matching and String Indexing from Phillips, Addison on 2009-10-07 (www-international@w3.org from October to December 2009)

From: Phillips, Addison <addison@amazon.com>
Date: Tue, 6 Oct 2009 23:14:27 -0400
To: CE Whitehead <cewcathar@hotmail.com>, "ishida@w3.org" <ishida@w3.org>, "www-international@w3.org" <www-international@w3.org>
Message-ID: <C7A5719F1E562149BA9171F58BEE2CA4129803A00B@EX-IAD6-B.ant.amazon.com>
Hi,

Thank you for your comments. However, two points:


1.       This document is now published as a WG Note. We shan’t be making any changes to it.

2.       This document was published as a WG Note strictly for historical reasons. It formed the basis for the CharMod work but was never formally published as a WG Note. It remained as a Working Draft lo these many years. Because this document is an important milestone, in its way, we felt that we should give it Note status rather than junking it.

Regards,

Addison

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG

Internationalization is not a feature.
It is an architecture.

From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of CE Whitehead
Sent: Tuesday, October 06, 2009 5:44 PM
To: ishida@w3.org; www-international@w3.org
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing

My remaining comments on

"Requirements for String Identity Matching and String Indexing"
(W3C Working Group Note 15 September 2009).

are on the content (but since this document is being published 'for historical reasons' I don't know if these will be helpful).

* * *
2.4;  PAR 2
"These differences can be handled by the (mainly native) users of the characters in question, and can at least be identified by users not familiar with the characters in question. Such similarities are explicitly not considered for string identity matching, because they do not need a coordinated solution for the entirety of the WWW."
{COMMENT:  All three differences?? Lower-case upper-case (or connected beginning, connected end/middle, unconnected in Arabic)
and diacritics??  I think these require a coordinated www solution especially in the case of IRI'S.
When I search and have no way to type in diacritics, I prefer that letters with or without diacritics be treated as the same; same for upper and lower case; this is great for searching so solutions may vary but policy about these with respect to the internationalization of URI's everything should be covered carefully by a universal WWW policy--perhaps the "clear character" model mentioned in section 4.7 may solve this problem??  I'm not sure.

* * *


4.1; Par 2
"Note: In many cases, it is highly preferable to use non-numeric ways of identifying substrings. The specification of string indexing for the WWW should not be seen as a general recommendation for the use of string indexing for substring identification. As an example, in the case of translation of a document from one language to another, identification of substrings based on document structure can be expected to be much more stable than identification based on string indexing."
I suppose there is already a w3c recommendation for document structure; I think a link to this would be helpful here???
* * *
Best,

C. E. Whitehead
cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>

________________________________
From: cewcathar@hotmail.com
To: ishida@w3.org; www-international@w3.org
Date: Tue, 6 Oct 2009 20:26:08 -0400
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing


I have one more proofreading comment for:
"Requirements for String Identity Matching and String Indexing"
(W3C Working Group Note 15 September 2009).

3.3; Sentence 3
"It may also provide a bit more time, in that we are just defining what might happen naturally anyway instead of having to fight uphill from day one."
{ COMMENT:  wordy:
>= "By doing so we are defining what might happen naturally anyway . . ."
}
Best,
--C. E. Whitehead
cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>
* * *

________________________________
From: cewcathar@hotmail.com
To: ishida@w3.org; www-international@w3.org
Date: Mon, 5 Oct 2009 15:37:22 -0400
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing

Hi!

My initial comments on:
"Requirements for String Identity Matching and String Indexing"
http://www.w3.org/TR/charreq/

are on proofreading!

2.3 PAR 2, last sentence

"A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms."

{CORRECTION:
"canonical-equivalent"
>= "canonically-equivalent"
See text at: http://en.wikipedia.org/wiki/Unicode_equivalence for an example of the use of "canonically-equivalent"}

* * *

2.10, PAR 2, first bullet

"It is a prerequisite for be conservative in what you send "
{ CORRECTION
>= "It is prerequisite to being conservative in what you send."

Alternately,

>= "It is prerequisite to one's being in what is sent."
}

* * *

3.2, PAR 1, last sentence

"As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized."

{COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is

"is normalized"
>= ?? "be normalized."
Also I would like some examples of the protocols here! }

* * *

3.2, last PAR, last sentence

"Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later."
{CORRECTION/COMMENT:  broken verb predicate (I think it's better to keep these together when you can):
>="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later."
}
* * *
4.4

{ COMMENT/CORRECTION?? :  I think I'd prefer
>= "sub-elements"
and
>=  "sub-element"
[that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example;
see:  http://www.google.com/search?hl=fr&source=hp&q=sub-element&btnG=Recherche+Google&lr=&aq=f&oq=!] }
* * *

I'll follow with a few questions/comments on the contents shortly!

Best,
C. E. Whitehead
cewcathar@hotmail.com<mailto:cewcathar@hotmail.com>


> From: ishida@w3.org
> To: www-international@w3.org
> Date: Thu, 1 Oct 2009 15:38:40 +0100
> Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing
>
> On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.
>
> http://www.w3.org/TR/charreq/

>
> This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication.
>
> The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing.
>
> Editor: Martin Dürst.
>
>
Received on Wednesday, 7 October 2009 03:15:05 UTC