comments on Character Model for the World Wide Web: String Matching and Searching

These are my comments on chapters 1 and 2 of the subject document ( ).


1) In 1.3, we find "The policies adopted by the IETF for on the use of character sets on the Internet are documented in [[RFC 2277]]." It seems that the word "for" should be removed.

2) In 2, we find "such as those define". It probably should be "such as those which define".

3) In 2, we find "implementations and tools need to consider the difficulties experienced by users who expect that visually and logically equivalent strings that "ought to" match but are considered to be distinct values and provide a means for users to see these differences and/or normalize them as appropriate." The phrase whose subject is "visually and logically equivalent strings" has no verb. It would probably be better to split the long sentence into shorter ones.

4) In 2.1, "the hexadecimal entity &20ac;" should be "the hexadecimal entity €".

5) In 2.2, "in different ways that is" should be "in different ways that are".

6) In 2.2 table of Canonical Equivalence, what happens for Hangul is not clear for someone not familiar with Hangul; what the Singleton line demonstrates is not clear to me.

7) In 2.2 table of Compatibility Equivalence, the example for Breaking differences shows only an hyphen in the second column. Maybe I am missing proper fonts, but I am surely missing the message.


8) Ibidem, I think that each example line should show at least 2 symbols which are deemed equivalent. I see only one symbol in the lines for Circled, Squared Characters, Fractions, Others. This is puzzling.


9) In 2.3, "A different form of text normalization that can applied" => "that can be applied".

10) In 2.3, we find "Case-insensitive matching is sometimes useful in contexts where case may vary in a way that is not semantically meaningful or in which case distinctions cannot be controlled by the user." I think that it is not "sometimes useful" but "often useful", maybe "most often useful". Most of the time, my searches in Latin text are, and must be, case insensitive.




Shalom (Regards),  Mati


Received on Thursday, 19 June 2014 12:23:48 UTC