Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate from r12a via GitHub on 2016-06-01 (public-annotation@w3.org from June 2016)

From: r12a via GitHub <sysbot+gh@w3.org>
Date: Wed, 01 Jun 2016 12:11:11 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-222973597-1464783071-sysbot+gh@w3.org>

the impression i got from talking with folks in meetings is as 
follows:  Boundaries for user selection are handled by the 
implementation with which the selection is made, and are not described
 in this spec. We would hope that they would do something sensible, 
ie. not allow selections that split grapheme clusters. Any text 
normalization specified by the model spec, as i understood it, is 
intended to make it easier to match the text against alternative forms
 of the same document.  

I'm still not entirely sure what those alternative forms would be 
exactly, since the target is closely defined as a specific resource.

I also find myself thinking more about why we want to normalize away 
the markup. I'm assuming it's because we expect the text quote 
selector text to match text extracted from the DOM using such things 
as `document.body.textContent`. If that's the case, i wonder whether 
it's appropriate to express this as a separate normalization step in 
the paragraph we have been talking about, or whether to just assume 
that it falls out of the recommendation in the following paragraph 
anyway (about generating the Text Position Selector values from DOM 
Level 3 APIs).

So i guess i'm asking the person who added the phrase about 
normalizing away the markup to the spec why they did so, so that we 
can better assess the appropriateness.

-- 
GitHub Notification of comment by r12a
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-222973597
 using your GitHub account

Received on Wednesday, 1 June 2016 12:11:14 UTC