Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate from r12a via GitHub on 2016-05-20 (public-annotation@w3.org from May 2016)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Fri, 20 May 2016 19:28:58 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-220697597-1463772537-sysbot+gh@w3.org>

I'd like to widen the ambit of this issue slightly.  (Actually i just 
noticed that @tilgovi is doing so also.)

For white space, see also 
https://github.com/w3c/web-annotation/issues/221

The other thing i'm concerned about is the phrase "and so forth".  The
 operations described as forming part of text normalisation here are 
aimed at achieving more interoperable comparisons of the selector text
 across different encodings and user agents. An open ended 'so forth' 
seems to leave the door open for the implementations to apply all 
sorts of independent, and therefore non-interoperable changes to the 
text.

Also, character normalization is a part of DOM String Comparisons, but
 doesn't appear to be required for browsers, as i read the text of the
 spec.  The reference to DOM string comparisons points to text that to
 my mind (correct me if i'm wrong) is more oriented to checking the 
well-formedness of the text according to XML criteria, rather than 
achieving an interoperable normalization form. For example, full 
normalisation requires that content should not start with a composing 
character (which i think we no longer agree on in some cases), but 
doesn't tell us what to do if that's not the case when reading text 
from the target.  Perhaps the link should be to the definition of 
Unicode-normalized instead 
(https://www.w3.org/TR/2004/REC-xml11-20040204/#dt-uninorm)?



-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-220697597
 using your GitHub account

Received on Friday, 20 May 2016 19:28:59 UTC