- From: Daniel Weck via GitHub <sysbot+gh@w3.org>
- Date: Mon, 22 Aug 2016 13:24:39 +0000
- To: public-annotation@w3.org
danielweck has just created a new issue for https://github.com/w3c/web-annotation: == TextPositionSelector, thoughts about Unicode code *point* vs. UTF16 code *unit* == Hello all, CC @iherman @azaroth42 In the EPUB3 CFI (Canonical Fragment Identifier) specification, which has a possible use in "Open Annotation in EPUB" ( http://www.idpf.org/epub/oa/ ), character-level offsets are defined as UTF16 code *units*, not Unicode code *points*. Current implementations of CFI (parsing / processing libraries, and text highlighting / rendering tools) that are written in Javascript benefit from direct code *unit* support (i.e. no handling / translation of Unicode surrogate pairs, etc.) in the DOM Range API and in the ECMAScript string API. See my comment here: https://github.com/IDPF/epub-revision/issues/555#issuecomment-144962949 So, although this design approach seems to work pretty well in EPUB3 / XHTML5, I wonder whether this is also relevant in the broader Open Web Platform context. For example, would a Javascript implementation of TextPositionSelector need to translate back and forth between Unicode code *points* and UTF16 code *units*, in order for the data to flow between the serialization format and the consuming web APIs? Any other thoughts? PS, I am "cross-posting" here too https://github.com/IDPF/epub-revision/issues/555#issuecomment-241407747 Please view or discuss this issue at https://github.com/w3c/web-annotation/issues/350 using your GitHub account
Received on Monday, 22 August 2016 13:24:45 UTC