Solution proposal for Issue #12

Hi Rob and all,

I'm working on Annotation Model Issue #12 (listed on W3C track, not the Github) [1], and I would like to propose a solution.

First of all, I have to apologize for making the unclear title. It must be "Call for precise definition of Offset".

Please take a look at the table (result.md) in my Gist page[2]. It shows text length gap in each programing language. How each Unicode code point is represented is available from the bl.ocks.org service [3].
My big concern is that an annotation made by Javascript application would not be correctly handled in PHP or Ruby applications due to the gap, and without precise definition of Offset, the format might lose interoperability. Unfortunately, it is not a bug in Javascript. According to 4.3.16 String value in ECMAScript spec [4], it is intentionally introduced.

>From my experience, the results made by Python 3, Ruby and PHP make sense. The length of a letter should be counted by code point basis, rather than byte length basis.

Do you have any concerns for introducing code point basis offset definition into the spec?


Thanks,
Takeshi Kanai

[1] https://www.w3.org/annotation/track/issues/12
[2] https://gist.github.com/tkanai/e2984cfa14cf099baa94
[3] http://bl.ocks.org/tkanai/e2984cfa14cf099baa94
[4] http://www.ecma-international.org/ecma-262/5.1/

Received on Tuesday, 24 March 2015 06:42:16 UTC