- From: r12a via GitHub <sysbot+gh@w3.org>
- Date: Tue, 17 May 2016 13:00:14 +0000
- To: www-international@w3.org
r12a has just labeled an issue for https://github.com/w3c/web-annotation as "i18n-review": == Normalisation of Text Quote Selector == [raised by r12a, not yet discussed by the i18n WG] 4.2.4 Text Quote Selector https://www.w3.org/TR/2016/WD-annotation-model-20160331/#text-quote-selector > The text MUST be normalized before recording. Thus HTML/XML tags should be removed, character entities should be replaced with the character that they encode, unnecessary whitespace should be normalized, character encoding should be turned into UTF-8, and so forth. The normalization routine may be performed automatically by a browser, and other applications should implement the DOM String Comparisons method. This allows the Selector to be used with different encodings and user agents and still have the same semantics and utility. I think we agreed on the teleconference that normalization is not appropriate before establishing a range using the Text Position Selector (counting characters), but it **is** appropriate for the Text Quote Selector (which selects a string with prefix and suffix), since the basis for identifying that location relies on matching strings. I just want to be sure that that's correct, and if so that the reference to [DOM String Comparisons](https://www.w3.org/TR/2016/WD-annotation-model-20160331/#bib-DOM-Level-3-Core) serves the expected purpose. See https://github.com/w3c/web-annotation/issues/222
Received on Tuesday, 17 May 2016 13:00:22 UTC