Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate from Felix Sasaki via GitHub on 2016-05-30 (public-annotation@w3.org from May 2016)

From: Felix Sasaki via GitHub <sysbot+gh@w3.org>
Date: Mon, 30 May 2016 09:31:20 +0000
To: public-annotation@w3.org
Message-ID: <issue_comment.created-222452770-1464600680-sysbot+gh@w3.org>

Hi all, just to emphasize one point that Ivan made: selectors are not 
only for HTML/XML markup. Hence, in the algorithm proposed at
https://github.com/w3c/web-annotation/issues/227#issuecomment-222330988
the step
"Remove all markup, such as HTML or XML tags."
is not applicable for other content formats on the Web. PDF is just 
one example format. 
On the step
"Normalization of whitespace by collapsing all whitespace tokens to a 
single ASCII space character (U+0020). "
For certain markup vocabularies (and for non markup content types as 
well), certain types of elements want to preserve white space. E.g. 
for the HTML pre element you would not want to remove white space. 
Emphasizing again: web annotation is for any type of web content. E.g.
 if I am putting DocBook content on the web and want to annotate 
programlisting elements, their whitespace should be preserved.

IMO for above reasons, the qualifier 'if applicable' is very 
important. I assume that many implementers will leave white space 
handling to the underlying library that handles low level content 
parsing. For example, during the ITS 2.0 development, I developed an 
implementation that parsed HTML content using validator.nu . The white
 space handling was left to that library. I assume the same for 
others. 

-- 
GitHub Notification of comment by fsasaki
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-222452770
 using your GitHub account

Received on Monday, 30 May 2016 09:31:22 UTC