- From: Daniel Weck via GitHub <sysbot+gh@w3.org>
- Date: Tue, 23 Aug 2016 10:33:34 +0000
- To: public-annotation@w3.org
Thanks @tkanai I agree that higher-level processing on Unicode 'code points' basis has its benefits (notably, text selections / character ranges are functionally closer to how human-readable languages / scripts are structured), but I was wondering about implementation feasibility and costs (in particular: performance). The use of UTF16 'code units' in EPUB3 CFI is consistent with the overall "low level" design (e.g. canonical syntax for XML element path based on numbered node references). So yes, CFI character ranges are totally unaware of Unicode "subtleties" such as grapheme clusters and surrogate pairs, which means that a CFI-authoring user interface must capture and constrain/adjust text selections in such a way that they make logical sense from the user's perspective (whilst the underlying CFI processor itself does not need to be "Unicode aware" to that degree). Web browsers implement high-level text selection pretty well already, so the responsibility of a typical CFI processing library basically boils down to handling the low-level UTF16-aware (UCS2) output from DOM Ranges or JavaScript string API (no need for sophisticated Punycode -like Unicode utilities). So, I am by no means claiming that the CFI model is applicable / superior to TextPositionSelector, I am just wondering about the pros and cons s :) -- GitHub Notification of comment by danielweck Please view or discuss this issue at https://github.com/w3c/web-annotation/issues/350#issuecomment-241691761 using your GitHub account
Received on Tuesday, 23 August 2016 10:33:42 UTC