[web-annotation] Text Position Selector in the DOM from BigBlueHat via GitHub on 2015-07-30 (public-annotation@w3.org from July 2015)

From: BigBlueHat via GitHub <sysbot+gh@w3.org>
Date: Thu, 30 Jul 2015 21:19:49 +0000
To: public-annotation@w3.org
Message-ID: <issues.opened-98267428-1438291188-sysbot+gh@w3.org>

BigBlueHat has just created a new issue for 
https://github.com/w3c/web-annotation:

== Text Position Selector in the DOM ==
@tilgovi and I were discussing [Text Position 
Selectors](http://www.w3.org/TR/annotation-model/#h5_text-position-selector)
 and how it works within the DOM and how to implement that 
practically.

It currently links to [this section of the DOM Level 3 Core 
documentation](http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-5DFED1F0-h3):
> 1.3.1 String Comparisons in the DOM
>
> The DOM has many interfaces that imply string matching. For XML, 
string comparisons are case-sensitive and performed with a binary 
comparison of the 16-bit units of the DOMStrings. However, for 
case-insensitive markup languages, such as HTML 4.01 or earlier, these
 comparisons are case-insensitive where appropriate.
>
> Note that HTML processors often perform specific case normalizations
 (canonicalization) of the markup before the DOM structures are built.
 This is typically using uppercase for element names and lowercase for
 attribute names. For this reason, applications should also compare 
element and attribute names returned by the DOM implementation in a 
case-insensitive manner.
>
> The character normalization, i.e. transforming into their fully 
normalized form as as defined in [XML 1.1], is assumed to happen at 
serialization time. The DOM Level 3 Load and Save module [DOM Level 3 
Load and Save] provides a serialization mechanism (see the 
DOMSerializer interface, section 2.3.1) and uses the DOMConfiguration 
parameters "normalize-characters" and "check-character-normalization" 
to assure that text is fully normalized [XML 1.1]. Other serialization
 mechanisms built on top of the DOM Level 3 Core also have to assure 
that text is fully normalized. 

...which is all terribly fascinating for anyone wanting to "normalize"
 or "canonicalize" other markup formats correctly...or implement a 
browser. :smile: 

However, what I think would be most/more helpful is pointing out that 
we mean (as far as I know) for developers to use 
`document.body.textContent` when storing numbers in Text Position 
Selector statements.

Is that correct?

The other document-wide text content is this one: 
`document.documentElement.textContent` which includes stuff in 
`<head>` (such as CSS, etc) and all kinds of other "hidden" text that 
the user can't select...so I'm thinking we *don't* want that one.

However, I can see the potential for someone wanting to treat the 
markup "as source" and annotate that...and I'm guessing we'd encourage
 Data Position Selector for that use case.

Should we add a note about using `document.body.textContent` to get 
the proper text to handle positions inside of? It'd make things much 
clearer, simpler, and faster to understand and implement. :grinning: 

Thanks!

See https://github.com/w3c/web-annotation/issues/59

Received on Thursday, 30 July 2015 21:19:51 UTC