RE: New Working Group Note: Requirements for String Identity Matching and String Indexing from CE Whitehead on 2009-10-07 (www-international@w3.org from October to December 2009)

From: CE Whitehead <cewcathar@hotmail.com>
Date: Wed, 7 Oct 2009 18:37:20 -0400
To: <addison@amazon.com>, <ishida@w3.org>, <www-international@w3.org>
Message-ID: <BLU109-W568A417B80E2B5BECB802B3CD0@phx.gbl>
O.k.., oh well, I'll wait for the next working draft.

 

--Best,

 

C. E. Whitehead

cewcathar@hotmail.com

 





From: addison@amazon.com
To: cewcathar@hotmail.com; ishida@w3.org; www-international@w3.org
Date: Tue, 6 Oct 2009 23:14:27 -0400
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing








Hi,
 
Thank you for your comments. However, two points:
 
1.       This document is now published as a WG Note. We shan’t be making any changes to it.
2.       This document was published as a WG Note strictly for historical reasons. It formed the basis for the CharMod work but was never formally published as a WG Note. It remained as a Working Draft lo these many years. Because this document is an important milestone, in its way, we felt that we should give it Note status rather than junking it.
 
Regards,
 
Addison
 

Addison Phillips
Globalization Architect -- Lab126
Chair -- W3C Internationalization WG
 
Internationalization is not a feature.
It is an architecture.
 



From: www-international-request@w3.org [mailto:www-international-request@w3.org] On Behalf Of CE Whitehead
Sent: Tuesday, October 06, 2009 5:44 PM
To: ishida@w3.org; www-international@w3.org
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing
 
My remaining comments on
 
"Requirements for String Identity Matching and String Indexing"
(W3C Working Group Note 15 September 2009).
 
are on the content (but since this document is being published 'for historical reasons' I don't know if these will be helpful).

* * *
2.4;  PAR 2
"These differences can be handled by the (mainly native) users of the characters in question, and can at least be identified by users not familiar with the characters in question. Such similarities are explicitly not considered for string identity matching, because they do not need a coordinated solution for the entirety of the WWW."
{COMMENT:  All three differences?? Lower-case upper-case (or connected beginning, connected end/middle, unconnected in Arabic)
and diacritics??  I think these require a coordinated www solution especially in the case of IRI'S.
When I search and have no way to type in diacritics, I prefer that letters with or without diacritics be treated as the same; same for upper and lower case; this is great for searching so solutions may vary but policy about these with respect to the internationalization of URI's everything should be covered carefully by a universal WWW policy--perhaps the "clear character" model mentioned in section 4.7 may solve this problem??  I'm not sure.

* * * 
 

4.1; Par 2
"Note: In many cases, it is highly preferable to use non-numeric ways of identifying substrings. The specification of string indexing for the WWW should not be seen as a general recommendation for the use of string indexing for substring identification. As an example, in the case of translation of a document from one language to another, identification of substrings based on document structure can be expected to be much more stable than identification based on string indexing."
I suppose there is already a w3c recommendation for document structure; I think a link to this would be helpful here???
* * *
Best,
 
C. E. Whitehead
cewcathar@hotmail.com
 



From: cewcathar@hotmail.com
To: ishida@w3.org; www-international@w3.org
Date: Tue, 6 Oct 2009 20:26:08 -0400
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing


I have one more proofreading comment for:
"Requirements for String Identity Matching and String Indexing"
(W3C Working Group Note 15 September 2009).
 
3.3; Sentence 3
"It may also provide a bit more time, in that we are just defining what might happen naturally anyway instead of having to fight uphill from day one."
{ COMMENT:  wordy:
>= "By doing so we are defining what might happen naturally anyway . . ."
}
Best,
--C. E. Whitehead
cewcathar@hotmail.com
* * *
 



From: cewcathar@hotmail.com
To: ishida@w3.org; www-international@w3.org
Date: Mon, 5 Oct 2009 15:37:22 -0400
Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing

Hi!
 
My initial comments on:
"Requirements for String Identity Matching and String Indexing"
http://www.w3.org/TR/charreq/
are on proofreading!

2.3 PAR 2, last sentence

"A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms."

{CORRECTION:  
"canonical-equivalent" 
>= "canonically-equivalent"
See text at: http://en.wikipedia.org/wiki/Unicode_equivalence for an example of the use of "canonically-equivalent"}

* * *

2.10, PAR 2, first bullet

"It is a prerequisite for be conservative in what you send "
{ CORRECTION
>= "It is prerequisite to being conservative in what you send."

Alternately,

>= "It is prerequisite to one's being in what is sent."
}

* * *

3.2, PAR 1, last sentence

"As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized."

{COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is 

"is normalized"  
>= ?? "be normalized."
Also I would like some examples of the protocols here! }

* * *

3.2, last PAR, last sentence

"Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later."
{CORRECTION/COMMENT:  broken verb predicate (I think it's better to keep these together when you can):  
>="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later."
}
* * *
4.4

{ COMMENT/CORRECTION?? :  I think I'd prefer 
>= "sub-elements"
and 
>=  "sub-element"
[that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example;
see:  http://www.google.com/search?hl=fr&source=hp&q=sub-element&btnG=Recherche+Google&lr=&aq=f&oq=!] }
* * *

I'll follow with a few questions/comments on the contents shortly!

Best,
C. E. Whitehead
cewcathar@hotmail.com

 
> From: ishida@w3.org
> To: www-international@w3.org
> Date: Thu, 1 Oct 2009 15:38:40 +0100
> Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing
> 
> On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.
> 
> http://www.w3.org/TR/charreq/
> 
> This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication.
> 
> The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing.
> 
> Editor: Martin Dürst.
> 
>  		 	   		  
--_52aa4f08-74a8-40bd-a5cf-ae5af49fcf9a_
Content-Type: text/html; charset="Windows-1252"
Content-Transfer-Encoding: quoted-printable

<html>
<head>
<style><!--
.hmmessage P
{
margin:0px;
padding:0px
}
body.hmmessage
{
font-size: 10pt;
font-family:Verdana
}
--></style>
</head>
<body class='hmmessage'>
<BR>O.k.., oh well, I'll wait for the next&nbsp;working draft.<BR>
&nbsp;<BR>
--Best,<BR>
&nbsp;<BR>
C. E. Whitehead<BR>
<A href="mailto:cewcathar@hotmail.com">cewcathar@hotmail.com</A><BR>
&nbsp;<BR>

<HR id=stopSpelling>
<BR>
From: addison@amazon.com<BR>To: cewcathar@hotmail.com; ishida@w3.org; www-international@w3.org<BR>Date: Tue, 6 Oct 2009 23:14:27 -0400<BR>Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing<BR><BR><BR>
<STYLE>
.ExternalClass .ecxshape
{;}
</STYLE>

<STYLE>
.ExternalClass p.ecxMsoNormal, .ExternalClass li.ecxMsoNormal, .ExternalClass div.ecxMsoNormal
{margin-bottom:.0001pt;font-size:12.0pt;font-family:'Times New Roman','serif';}
.ExternalClass a:link, .ExternalClass span.ecxMsoHyperlink
{color:blue;text-decoration:underline;}
.ExternalClass a:visited, .ExternalClass span.ecxMsoHyperlinkFollowed
{color:purple;text-decoration:underline;}
.ExternalClass p
{margin-right:0in;margin-left:0in;font-size:12.0pt;font-family:'Times New Roman','serif';}
.ExternalClass p.ecxMsoListParagraph, .ExternalClass li.ecxMsoListParagraph, .ExternalClass div.ecxMsoListParagraph
{margin-right:0in;margin-bottom:0in;margin-left:.5in;margin-bottom:.0001pt;font-size:12.0pt;font-family:'Times New Roman','serif';}
.ExternalClass span.ecxEmailStyle18
{font-family:'Calibri','sans-serif';color:#1F497D;}
.ExternalClass .ecxMsoChpDefault
{font-size:10.0pt;}
@page Section1
{size:8.5in 11.0in;}
.ExternalClass div.ecxSection1
{page:Section1;}
.ExternalClass ol
{margin-bottom:0in;}
.ExternalClass ul
{margin-bottom:0in;}
</STYLE>

<DIV class=ecxSection1>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Hi,</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Thank you for your comments. However, two points:</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<P class=ecxMsoListParagraph style="TEXT-INDENT: -0.25in"><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><SPAN>1.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN dir=ltr></SPAN><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">This document is now published as a WG Note. We shan’t be making any changes to it.</SPAN></P>
<P class=ecxMsoListParagraph style="TEXT-INDENT: -0.25in"><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'"><SPAN>2.<SPAN style="FONT: 7pt 'Times New Roman'">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; </SPAN></SPAN></SPAN><SPAN dir=ltr></SPAN><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">This document was published as a WG Note strictly for historical reasons. It formed the basis for the CharMod work but was never formally published as a WG Note. It remained as a Working Draft lo these many years. Because this document is an important milestone, in its way, we felt that we should give it Note status rather than junking it.</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Regards,</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">Addison</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<DIV>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">Addison Phillips</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">Globalization Architect -- Lab126</SPAN><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'"></SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">Chair -- W3C Internationalization WG</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">&nbsp;</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">Internationalization is not a feature.</SPAN></P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 9pt; COLOR: #1f497d; FONT-FAMILY: 'Lucida Sans Unicode','sans-serif'">It is an architecture.</SPAN></P></DIV>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 11pt; COLOR: #1f497d; FONT-FAMILY: 'Calibri','sans-serif'">&nbsp;</SPAN></P>
<DIV style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: medium none; PADDING-LEFT: 4pt; PADDING-BOTTOM: 0in; BORDER-LEFT: blue 1.5pt solid; PADDING-TOP: 0in; BORDER-BOTTOM: medium none">
<DIV>
<DIV style="BORDER-RIGHT: medium none; PADDING-RIGHT: 0in; BORDER-TOP: #b5c4df 1pt solid; PADDING-LEFT: 0in; PADDING-BOTTOM: 0in; BORDER-LEFT: medium none; PADDING-TOP: 3pt; BORDER-BOTTOM: medium none">
<P class=ecxMsoNormal><B><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'">From:</SPAN></B><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Tahoma','sans-serif'"> www-international-request@w3.org [mailto:www-international-request@w3.org] <B>On Behalf Of </B>CE Whitehead<BR><B>Sent:</B> Tuesday, October 06, 2009 5:44 PM<BR><B>To:</B> ishida@w3.org; www-international@w3.org<BR><B>Subject:</B> RE: New Working Group Note: Requirements for String Identity Matching and String Indexing</SPAN></P></DIV></DIV>
<P class=ecxMsoNormal>&nbsp;</P>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Verdana','sans-serif'">My remaining comments on<BR>&nbsp;<BR>"Requirements for String Identity Matching and String Indexing"<BR>(W3C Working Group Note 15 September 2009).<BR>&nbsp;<BR>are on the content (but since this document is&nbsp;being&nbsp;published&nbsp;'for historical reasons' I don't know if these will be helpful).<BR><BR>* * *<BR>2.4; &nbsp;PAR 2<BR>"These differences can be handled by the (mainly native) users of the characters in question, and can at least be identified by users not familiar with the characters in question. Such similarities are explicitly not considered for string identity matching, because they do not need a coordinated solution for the entirety of the WWW."<BR>{COMMENT:&nbsp; All three differences?? Lower-case upper-case (or connected beginning, connected end/middle, unconnected in Arabic)<BR>and diacritics??&nbsp; I think these require a coordinated www solution especially in the case of IRI'S.<BR>When I search and have no way to type in diacritics, I prefer that letters with or without diacritics be treated as the same; same for upper and lower case; this is great for searching so solutions may vary but policy about these with respect to the internationalization of URI's everything should be covered carefully by a universal WWW policy--perhaps the "clear character" model mentioned in section 4.7 may solve this problem??&nbsp; I'm not sure.<BR><BR>* * * <BR>&nbsp;<BR><BR>4.1; Par 2<BR>"Note: In many cases, it is highly preferable to use non-numeric ways of identifying substrings. The specification of string indexing for the WWW should not be seen as a general recommendation for the use of string indexing for substring identification. As an example, in the case of translation of a document from one language to another, identification of substrings based on document structure can be expected to be much more stable than identification based on string indexing."<BR>I suppose there is already a w3c recommendation for document structure; I think a link to this would be helpful here???<BR>* * *<BR>Best,<BR>&nbsp;<BR>C. E. Whitehead<BR><A href="mailto:cewcathar@hotmail.com">cewcathar@hotmail.com</A><BR>&nbsp;</SPAN></P>
<DIV class=ecxMsoNormal style="TEXT-ALIGN: center" align=center><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Verdana','sans-serif'">
<HR id=ecxstopSpelling align=center width="100%" SIZE=2>
</SPAN></DIV>
<P class=ecxMsoNormal><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Verdana','sans-serif'">From: cewcathar@hotmail.com<BR>To: ishida@w3.org; www-international@w3.org<BR>Date: Tue, 6 Oct 2009 20:26:08 -0400<BR>Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing<BR><BR><BR>I have one more proofreading comment for:<BR>"Requirements for String Identity Matching and String Indexing"<BR>(W3C Working Group Note 15 September 2009).<BR>&nbsp;<BR>3.3; Sentence 3<BR>"It may also provide a bit more time, in that we are just defining what might happen naturally anyway instead of having to fight uphill from day one."<BR>{ COMMENT:&nbsp; wordy:<BR>&gt;= "By doing so we are defining what might happen naturally anyway . . ."<BR>}<BR>Best,<BR>--C. E. Whitehead<BR><A href="mailto:cewcathar@hotmail.com">cewcathar@hotmail.com</A><BR>* * *<BR>&nbsp;</SPAN></P>
<DIV class=ecxMsoNormal style="TEXT-ALIGN: center" align=center><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Verdana','sans-serif'">
<HR id=ecxecxstopSpelling align=center width="100%" SIZE=2>
</SPAN></DIV>
<P class=ecxMsoNormal style="MARGIN-BOTTOM: 12pt"><SPAN style="FONT-SIZE: 10pt; FONT-FAMILY: 'Verdana','sans-serif'">From: cewcathar@hotmail.com<BR>To: ishida@w3.org; www-international@w3.org<BR>Date: Mon, 5 Oct 2009 15:37:22 -0400<BR>Subject: RE: New Working Group Note: Requirements for String Identity Matching and String Indexing<BR><BR>Hi!<BR>&nbsp;<BR>My initial comments on:<BR>"Requirements for String Identity Matching and String Indexing"<BR>http://www.w3.org/TR/charreq/<BR>are on proofreading!<BR><BR>2.3 PAR 2, last sentence<BR><BR>"A process shall not assume that the interpretations of two canonical-equivalent character sequences are distinct. Additions may include some presentation forms."<BR><BR>{CORRECTION:&nbsp; <BR>"canonical-equivalent" <BR>&gt;= "canonically-equivalent"<BR>See text at: <A href="http://en.wikipedia.org/wiki/Unicode_equivalence">http://en.wikipedia.org/wiki/Unicode_equivalence</A> for an example of the use of "canonically-equivalent"}<BR><BR>* * *<BR><BR>2.10, PAR 2, first bullet<BR><BR>"It is a prerequisite for be conservative in what you send "<BR>{ CORRECTION<BR>&gt;= "It is prerequisite to being conservative in what you send."<BR><BR>Alternately,<BR><BR>&gt;= "It is prerequisite to one's being in what is sent."<BR>}<BR><BR>* * *<BR><BR>3.2, PAR 1, last sentence<BR><BR>"As an example, it could be required that text transmitted via certain protocols, or text exposed in certain APIs, is normalized."<BR><BR>{COMMENTS: ?? You used the indicative ("is normalized"), and not the subjunctive, which may be o.k. in the U.K. but in the U.S. the correct grammar is <BR><BR>"is normalized"&nbsp; <BR>&gt;= ?? "be normalized."<BR>Also I would like some examples of the protocols here! }<BR><BR>* * *<BR><BR>3.2, last PAR, last sentence<BR><BR>"Such a transfer is indeed highly desirable in many cases, because to avoid generating unnormalized data is in many cases easier than to normalize such data later."<BR>{CORRECTION/COMMENT:&nbsp; broken verb predicate (I think it's better to keep these together when you can):&nbsp; <BR>&gt;="Such a transfer is indeed highly desirable in many cases, because it is in many cases easier to avoid generating unnormalized data than it is to normalize such data later."<BR>}<BR>* * *<BR>4.4<BR><BR>{ COMMENT/CORRECTION?? :&nbsp; I think I'd prefer <BR>&gt;= "sub-elements"<BR>and <BR>&gt;=&nbsp; "sub-element"<BR>[that is, I think this word needs a hyphen--but some people don't hyphenate--IBM, for example;<BR>see:&nbsp; <A href="http://www.google.com/search?hl=fr&amp;source=hp&amp;q=sub-element&amp;btnG=Recherche+Google&amp;lr=&amp;aq=f&amp;oq">http://www.google.com/search?hl=fr&amp;source=hp&amp;q=sub-element&amp;btnG=Recherche+Google&amp;lr=&amp;aq=f&amp;oq</A>=!] }<BR>* * *<BR><BR>I'll follow with a few questions/comments on the contents shortly!<BR><BR>Best,<BR>C. E. Whitehead<BR><A href="mailto:cewcathar@hotmail.com">cewcathar@hotmail.com</A><BR><BR>&nbsp;<BR>&gt; From: ishida@w3.org<BR>&gt; To: www-international@w3.org<BR>&gt; Date: Thu, 1 Oct 2009 15:38:40 +0100<BR>&gt; Subject: New Working Group Note: Requirements for String Identity Matching and String Indexing<BR>&gt; <BR>&gt; On 15th September, the Internationalization Core Working Group published Requirements for String Identity Matching and String Indexing as a Working Group Note.<BR>&gt; <BR>&gt; http://www.w3.org/TR/charreq/<BR>&gt; <BR>&gt; This document was published as a Working Group note in order to capture and preserve historical information. It contains requirements elaborated in 1998 for aspects of the character model for W3C specifications. It was developed and extensively reviewed by the Internationalization Working Group, but never progressed beyond Working Draft status. For this publication, the wording of the 1998 version remains unchanged (except for correction of a small number of typographic errors), but the links to references have been updated prior to this publication.<BR>&gt; <BR>&gt; The document describes requirements for some important aspects of the character model for W3C specifications. The two aspects discussed are string identity matching and string indexing.<BR>&gt; <BR>&gt; Editor: Martin Dürst.<BR>&gt; <BR>&gt; </SPAN></P></DIV></DIV> 		 	   		  </body>
</html>
--_52aa4f08-74a8-40bd-a5cf-ae5af49fcf9a_--
Received on Wednesday, 7 October 2009 22:37:55 UTC