Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate from Randall Leeds via GitHub on 2016-05-20 (public-annotation@w3.org from May 2016)

From: Randall Leeds via GitHub <sysbot+gh@w3.org>
Date: Fri, 20 May 2016 18:41:41 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-220686625-1463769701-sysbot+gh@w3.org>

I think these normalization rules need some change.

Code points don't refer to a particular byte encoding. A single code 
point may be 1 or more bytes, depending on the encoding. I hope I'm 
using these terms correctly.

Counting code points, it shouldn't matter if the quote is UTF-8 and 
the text is UTF-16.

Why do we specify normalization of white space, though? Why strip the 
selector of precision? Maybe my annotation is intended to target some 
whitespace.

We should maybe be clearer about Unicode normalization. I'd say let's 
explicitly recommend against transforming the code points, removing 
combining codes and such, but I'm a little worried tools might do this
 anyway and we might best be served by suggesting that normalization 
is always done. That's another performance impacting demand to make, 
though.

-- 
GitHub Notification of comment by tilgovi
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-220686625
 using your GitHub account

Received on Friday, 20 May 2016 18:41:43 UTC