- From: BigBlueHat via GitHub <sysbot+gh@w3.org>
- Date: Thu, 30 Jul 2015 21:19:49 +0000
- To: public-annotation@w3.org
BigBlueHat has just created a new issue for https://github.com/w3c/web-annotation: == Text Position Selector in the DOM == @tilgovi and I were discussing [Text Position Selectors](http://www.w3.org/TR/annotation-model/#h5_text-position-selector) and how it works within the DOM and how to implement that practically. It currently links to [this section of the DOM Level 3 Core documentation](http://www.w3.org/TR/DOM-Level-3-Core/core.html#ID-5DFED1F0-h3): > 1.3.1 String Comparisons in the DOM > > The DOM has many interfaces that imply string matching. For XML, string comparisons are case-sensitive and performed with a binary comparison of the 16-bit units of the DOMStrings. However, for case-insensitive markup languages, such as HTML 4.01 or earlier, these comparisons are case-insensitive where appropriate. > > Note that HTML processors often perform specific case normalizations (canonicalization) of the markup before the DOM structures are built. This is typically using uppercase for element names and lowercase for attribute names. For this reason, applications should also compare element and attribute names returned by the DOM implementation in a case-insensitive manner. > > The character normalization, i.e. transforming into their fully normalized form as as defined in [XML 1.1], is assumed to happen at serialization time. The DOM Level 3 Load and Save module [DOM Level 3 Load and Save] provides a serialization mechanism (see the DOMSerializer interface, section 2.3.1) and uses the DOMConfiguration parameters "normalize-characters" and "check-character-normalization" to assure that text is fully normalized [XML 1.1]. Other serialization mechanisms built on top of the DOM Level 3 Core also have to assure that text is fully normalized. ...which is all terribly fascinating for anyone wanting to "normalize" or "canonicalize" other markup formats correctly...or implement a browser. :smile: However, what I think would be most/more helpful is pointing out that we mean (as far as I know) for developers to use `document.body.textContent` when storing numbers in Text Position Selector statements. Is that correct? The other document-wide text content is this one: `document.documentElement.textContent` which includes stuff in `<head>` (such as CSS, etc) and all kinds of other "hidden" text that the user can't select...so I'm thinking we *don't* want that one. However, I can see the potential for someone wanting to treat the markup "as source" and annotate that...and I'm guessing we'd encourage Data Position Selector for that use case. Should we add a note about using `document.body.textContent` to get the proper text to handle positions inside of? It'd make things much clearer, simpler, and faster to understand and implement. :grinning: Thanks! See https://github.com/w3c/web-annotation/issues/59
Received on Thursday, 30 July 2015 21:19:51 UTC