- From: Rob Sanderson via GitHub <sysbot+gh@w3.org>
- Date: Fri, 30 Sep 2016 17:54:13 +0000
- To: www-international@w3.org
azaroth42 has just labeled an issue for https://github.com/w3c/web-annotation as "i18n-review": == TextPositionSelector, thoughts about Unicode code *point* vs. UTF16 code *unit* == Hello all, CC @iherman @azaroth42 In the EPUB3 CFI (Canonical Fragment Identifier) specification, which has a possible use in "Open Annotation in EPUB" ( http://www.idpf.org/epub/oa/ ), character-level offsets are defined as UTF16 code *units*, not Unicode code *points*. Current implementations of CFI (parsing / processing libraries, and text highlighting / rendering tools) that are written in Javascript benefit from direct code *unit* support (i.e. no handling / translation of Unicode surrogate pairs, etc.) in the DOM Range API and in the ECMAScript string API. See my comment here: https://github.com/IDPF/epub-revision/issues/555#issuecomment-144962949 So, although this design approach seems to work pretty well in EPUB3 / XHTML5, I wonder whether this is also relevant in the broader Open Web Platform context. For example, would a Javascript implementation of TextPositionSelector need to translate back and forth between Unicode code *points* and UTF16 code *units*, in order for the data to flow between the serialization format and the consuming web APIs? Any other thoughts? PS, I am "cross-posting" here too https://github.com/IDPF/epub-revision/issues/555#issuecomment-241407747 See https://github.com/w3c/web-annotation/issues/350
Received on Friday, 30 September 2016 17:54:22 UTC