RE: comments on Character Model for the World Wide Web: String Matching and Searching

Hi Mati,

Thanks for these as well. Responses follow.

Note that my copy is NOT posted over the one you’re currently reviewing.

From: Matitiahu Allouche [mailto:matitiahu.allouche@gmail.com]
Sent: Thursday, June 19, 2014 8:16 AM
To: www-international@w3.org
Subject: comments on Character Model for the World Wide Web: String Matching and Searching

These are my comments on chapters 3 and 4 of the subject document ( http://www.w3.org/International/docs/charmod-norm/ ).

12) In 3, some requirements (e.g. Req 1) are labelled with [S], [I], [C] and some are not (e.g. Req 2, 3). I think that they should all be labelled with at least one for those marks.

AP> I agree. I have attempted to supply them in my copy.

13) In Req 4, the first token "C3xx" seems to be a leftover from something else. In the same paragraph, item 2 mentions character "includes". This term has not been mentioned and explained before, and is not obvious for me.

AP> I removed the extraneous numbering. And then I went further. This requirement is actually an algorithm for “case sensitive matching”, so I moved the text into the block called that and removed the requirement.

13) In 3.1.2, "languaguages" should be "languages".

AP> Fixed.

14) Ibidem, "occaisionally" should be "occasionally".

AP> Done.

15) Here and there: the document postulates that if a protocol allows user-defined names or identifiers, those tokens must allow non-ASCII characters, thus ASCII case-insensitive comparison is forbidden. It seems to me that this is extending the requirements in this document beyond its scope.
I agree that we (meaning the i18n crowd) like to promote use of non-ASCII characters everywhere, but it is not the scope of this document to state whether a given protocol allows such characters in identifiers. If it does not, why should we ban the use of ASCII case-insensitive comparison? We can recommend against it, we can explain that this restricts options for a more liberal character set in the future, but should we really make it non conformant?

AP> I don’t actually see this anywhere and I have been at pains to remove cases like this. The document *DOES* spell out that you can’t use ACI for any non-ASCII namespace. But the document doesn’t draw any value judgments for/against this kind of namespace, at least in the requirements. As a WG, we generally frown at namespaces that allow users to name things but then restrict the character set in some arbitrary way.

AP> If you can call out specific locations in the text, I can address them.

16) When text between 2 requirements mentions "this requirement", it is not clear if it refers to the requirement above it or below it. For instance, see the text between Req 14 and Req 15.

AP> I tried to clean these up.

17) In 4, "this section addressed" should be "This section addresses".

AP> Done.

18) In 4.1, "generate different user expectations" should be "generates…".

AP> Done.

19) In 4.1, instead of "they expect their more-specific input to match only what has been input", I suggest "they expect the search results to match closely their more-specific input".

AP> Good. Done.


--
Shalom (Regards),  Mati

Received on Thursday, 19 June 2014 22:00:24 UTC