Re: [web-annotation] Reference to text encoding in spec perhaps not appropriate

> Normalization is for ease of comparison plus robustness across 
formats.

I'm suggesting that some normalization actually hinders comparison 
unless that normalization can be very precisely specified.

> I don't see a solution that will get us to a CR text by the end of 
this week?

I actually find the current editor's draft text totally acceptable. 
It's only the lingering possibility around this thread that we 
normalize whitespace or unicode that I oppose.

For whitespace, I oppose it because it can be very hard to determine 
what white space is meaningful. I may have a text/plain document that 
uses manual line breaks, as is common with code documentation, where 
only two successive line breaks are really "logical" line breaks. In 
HTML, it's necessary to resolve the CSS styling to know whether white 
space is preserved or not. And so forth.

For unicode, I oppose it because the W3C already recommends NFC for 
the Web [1]. We should assume documents already contain normalized 
unicode forms and not put the burden on implementers of annotation to 
do so. Furthermore, we can easily imagine trivial use cases for 
annotation where it would be undesirable. Perhaps I want to make an 
HTML validator that warns non-normalized characters. I would need to 
annotate the specific, non-normalized text to mark it as such.

So, I think the text has improved as a result of discussion on this 
issue and I find it satisfactory now. My previous comments should be 
taken to mean that I believe we have arrived at a reasonable 
description of appropriate normalization, namely the normalization 
that is already done automatically by browsers if you do use 
`textContent` (strip tags, convert character entities, preserve white 
space and unicode forms).

[1] 
https://www.w3.org/International/questions/qa-html-css-normalization

-- 
GitHub Notification of comment by tilgovi
Please view or discuss this issue at 
https://github.com/w3c/web-annotation/issues/227#issuecomment-223071711
 using your GitHub account

Received on Wednesday, 1 June 2016 17:49:15 UTC