- From: Ivan Herman via GitHub <sysbot+gh@w3.org>
- Date: Wed, 01 Jun 2016 12:58:23 +0000
- To: public-annotation@w3.org
> On 1 Jun 2016, at 14:49, r12a <notifications@github.com> wrote: > > And if there were copy paste-s done when putting together the text, then the representation of the same text may be slightly different within the text… Hence the normalization. > > @iherman <https://github.com/iherman> too many 'text' words there for me to be sure what you're saying. The only way i can see to understand this is if the Text Position Selector values are manually created by users looking at the target text and typing what they think they see into the annotation body. Is that a valid use case? > > That is not what I meant. Imagine that File.html contains the word "Iván" twice. However, the way File.html was created is such that somebody copy-pasted text from File1.html and then from File2.html. The first contained "Iván", the other contained "Iva´n" (I mean the relevant unicode encoding are different). The end result is that the word "Iván" in File.html is there in two different internal format. Then somebody wants to annotate File.html, and wants to annotate the various "Iván"-s. The system would put "Iván" into the Text Quote Selector for exact match. The only way the match would really work is to have the normalization... > I don't see how normalization helps distinguish between possible matches when there are mutliple alternative ranges of text in the target document that match the text position selector values. If anything, i'd have thought it would do the opposite, by removing idiosynchratic differences, which is what normailzation is about. If you want to find all possible matches, then that's fine, but i think that here we want to find the unique match where possible, no? > -- GitHub Notification of comment by iherman Please view or discuss this issue at https://github.com/w3c/web-annotation/issues/227#issuecomment-222984227 using your GitHub account
Received on Wednesday, 1 June 2016 12:58:28 UTC