Re: [charmod-norm] Does ZWJ/ZWNJ affect meaning?

Generally for matching a lot of these characters are probably 
ignorable, although it depends on the context.

If the context is the "find" portion of Charmod, then I think the 
recommendation is to ignore them always. If you're searching for 
"alone" and run into some "bodies" along the way, that may not be so 
bad. Similarly the various Variation Selectors (Mongolian Free or just
 plain old VSx) probably should be excluded.

If the context were to be namespaced matching operations, I'm not as 
clear on it. In fact, I suspect that this is kind of the same problem 
that normalization is. If normalization is not specified or externally
 supplied by the specification (that is, it isn't done when matching),
 then it is up to the document author(s) to ensure that they use the 
same character sequences. Then your "bodies" are never "alone" :-). 
The advice, instead, is to avoid using invisibles in IDs, names, 
attributes, etc. unless it cannot be avoided. And then, if the 
characters are required, to adopt a consistent encoding of the text to
 avoid matching issues.

-- 
GitHub Notification of comment by aphillips
Please view or discuss this issue at 
https://github.com/w3c/charmod-norm/issues/44#issuecomment-205988718 
using your GitHub account

Received on Tuesday, 5 April 2016 21:23:28 UTC