[web-annotation] Reference to text encoding in spec perhaps not appropriate from Nick Stenning via GitHub on 2016-05-18 (public-annotation@w3.org from May 2016)

From: Nick Stenning via GitHub <sysbot+gh@w3.org>
Date: Wed, 18 May 2016 09:07:11 +0000
To: public-annotation@w3.org
Message-ID: <issues.opened-155451230-1463562430-sysbot+gh@w3.org>

nickstenning has just created a new issue for 
https://github.com/w3c/web-annotation:

== Reference to text encoding in spec perhaps not appropriate ==
#222 made me aware of the following text in the model spec:

4.2.4 Text Quote Selector
https://www.w3.org/TR/2016/WD-annotation-model-20160331/#text-quote-selector
> The text must be normalized before recording. Thus HTML/XML tags 
should be removed, character entities should be replaced with the 
character that they encode, unnecessary whitespace should be 
normalized, **character encoding should be turned into UTF-8**, and so
 forth. The normalization routine may be performed automatically by a 
browser, and other applications should implement the DOM String 
Comparisons method. This allows the Selector to be used with different
 encodings and user agents and still have the same semantics and 
utility.

If all selector references are to be w.r.t. codepoint sequences (c.f. 
#206) then I'm not sure the spec should be referring to text encoding.
 (Because we're assuming that you're annotating unicode text, not some
 byte sequence.)

Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227 using your GitHub 
account

Received on Wednesday, 18 May 2016 09:07:13 UTC