- From: aphillips via GitHub <sysbot+gh@w3.org>
- Date: Tue, 05 Apr 2016 21:23:27 +0000
- To: public-i18n-archive@w3.org
Generally for matching a lot of these characters are probably ignorable, although it depends on the context. If the context is the "find" portion of Charmod, then I think the recommendation is to ignore them always. If you're searching for "alone" and run into some "bodies" along the way, that may not be so bad. Similarly the various Variation Selectors (Mongolian Free or just plain old VSx) probably should be excluded. If the context were to be namespaced matching operations, I'm not as clear on it. In fact, I suspect that this is kind of the same problem that normalization is. If normalization is not specified or externally supplied by the specification (that is, it isn't done when matching), then it is up to the document author(s) to ensure that they use the same character sequences. Then your "bodies" are never "alone" :-). The advice, instead, is to avoid using invisibles in IDs, names, attributes, etc. unless it cannot be avoided. And then, if the characters are required, to adopt a consistent encoding of the text to avoid matching issues. -- GitHub Notification of comment by aphillips Please view or discuss this issue at https://github.com/w3c/charmod-norm/issues/44#issuecomment-205988718 using your GitHub account
Received on Tuesday, 5 April 2016 21:23:28 UTC