Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate

> On 1 Jun 2016, at 14:49, r12a <notifications@github.com> wrote:
> 
> And if there were copy paste-s done when putting together the text, 
then the representation of the same text may be slightly different 
within the text… Hence the normalization.
> 
> @iherman <https://github.com/iherman> too many 'text' words there 
for me to be sure what you're saying. The only way i can see to 
understand this is if the Text Position Selector values are manually 
created by users looking at the target text and typing what they think
 they see into the annotation body. Is that a valid use case?
> 
> 

That is not what I meant. Imagine that File.html contains the word 
"Iván" twice. However, the way File.html was created is such that 
somebody copy-pasted text from File1.html and then from File2.html. 
The first contained "Iván", the other contained "Iva´n" (I mean the 
relevant unicode encoding are different). The end result is that the 
word "Iván" in File.html is there in two different internal format.

Then somebody wants to annotate File.html, and wants to annotate the 
various "Iván"-s. The system would put "Iván" into the Text Quote 
Selector for exact match. The only way the match would really work is 
to have the normalization...


> I don't see how normalization helps distinguish between possible 
matches when there are mutliple alternative ranges of text in the 
target document that match the text position selector values. If 
anything, i'd have thought it would do the opposite, by removing 
idiosynchratic differences, which is what normailzation is about. If 
you want to find all possible matches, then that's fine, but i think 
that here we want to find the unique match where possible, no?
> 






-- 
GitHub Notification of comment by iherman
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-222984227
 using your GitHub account

Received on Wednesday, 1 June 2016 12:58:28 UTC