Re: CR Feedback: String counting and offsets

Hello Martin,

thanks for the pointer. Good work, this was exactly what I was looking for.
Maybe a reference to the character model can be added to the current CR 
to keep the pointer.

All the best,

On 20.10.2015 13:20, Martin J. Dürst wrote:
> Hello Sebastian,
> There is already quite a bit about character counting/string length at 
> But it just gives 
> some guidelines.
> The Encoding CR (not a Recommendation yet) deals with encoding 
> conversions, not with what you do once you have a single internal 
> encoding.
> Regards,   Martin.
> On 2015/10/20 19:02, Sebastian Hellmann wrote:
>> Hi all,
>> I am new, so sorry, if I  reraise a topic.
>> I was wondering, whether  the Encoding Recommendation would be the right
>> place to tackle a string counting issue. Lot's of programming languages
>> and specifications have quite different implementations regarding string
>> counting. I am sure you are aware of this. A particular example is this
>> spec in Section 2.1.2:
>> which specifies to count two code point as one, or PHP with
>> |strlen(utf8_decode("ä")) != ||strlen("ä")|
>> Could we include some definitions in the standard on how strings are
>> counted and define a way to have offsets over these strings?
>> Suggestion 1 (easy change): In the terminology section:
>> A string is a sequence of code points.
>> The /length/ of a string equals the number of contained code points.
>> Suggestion 2 :
>> Define offsets similar to this image:
>> e.g. start with 0 and then count the gaps.
>> I would have high hopes that some implementers would pick it up
>> eventually. Such a definition would help immensely in the area of text
>> annotation and might also be an issue for the Web Annotation Group.
>> All the best,
>> Sebastian

Sebastian Hellmann
AKSW/KILT research group
Insitute for Applied Informatics (InfAI) at University Leipzig
DBpedia Association
* *Oct 31st, 2015* Deadline for Quality Management of Semantic Web 
Assets (Data, Services and Systems) 
Venha para a Alemanha como PhD:
Research Group:

Received on Tuesday, 20 October 2015 11:38:26 UTC