- From: Sebastian Hellmann <hellmann@informatik.uni-leipzig.de>
- Date: Tue, 20 Oct 2015 13:37:51 +0200
- To: "Martin J. Dürst" <duerst@it.aoyama.ac.jp>, www-international@w3.org
- Message-ID: <5626278F.80209@informatik.uni-leipzig.de>
Hello Martin, thanks for the pointer. Good work, this was exactly what I was looking for. Maybe a reference to the character model can be added to the current CR to keep the pointer. All the best, Sebastian On 20.10.2015 13:20, Martin J. Dürst wrote: > Hello Sebastian, > > There is already quite a bit about character counting/string length at > http://www.w3.org/TR/charmod/#sec-stringIndexing. But it just gives > some guidelines. > > The Encoding CR (not a Recommendation yet) deals with encoding > conversions, not with what you do once you have a single internal > encoding. > > Regards, Martin. > > On 2015/10/20 19:02, Sebastian Hellmann wrote: >> Hi all, >> I am new, so sorry, if I reraise a topic. >> >> I was wondering, whether the Encoding Recommendation would be the right >> place to tackle a string counting issue. Lot's of programming languages >> and specifications have quite different implementations regarding string >> counting. I am sure you are aware of this. A particular example is this >> spec in Section 2.1.2: https://tools.ietf.org/html/rfc5147#section-2.1.2 >> which specifies to count two code point as one, or PHP with >> |strlen(utf8_decode("ä")) != ||strlen("ä")| >> >> Could we include some definitions in the standard on how strings are >> counted and define a way to have offsets over these strings? >> >> Suggestion 1 (easy change): In the terminology section: >> A string is a sequence of code points. >> The /length/ of a string equals the number of contained code points. >> >> Suggestion 2 : >> Define offsets similar to this image: >> http://persistence.uni-leipzig.org/nlp2rdf/specification/image/iso+24612-2012.png >> >> >> e.g. start with 0 and then count the gaps. >> >> I would have high hopes that some implementers would pick it up >> eventually. Such a definition would help immensely in the area of text >> annotation and might also be an issue for the Web Annotation Group. >> >> All the best, >> Sebastian >> >> >> > -- Sebastian Hellmann AKSW/KILT research group Insitute for Applied Informatics (InfAI) at University Leipzig DBpedia Association Events: * *Oct 31st, 2015* Deadline for Quality Management of Semantic Web Assets (Data, Services and Systems) <http://www.semantic-web-journal.net/blog/call-papers-special-issue-quality-management-semantic-web-assets-data-services-and-systems> Venha para a Alemanha como PhD: http://bis.informatik.uni-leipzig.de/csf Projects: http://dbpedia.org, http://nlp2rdf.org, http://linguistics.okfn.org, https://www.w3.org/community/ld4lt <http://www.w3.org/community/ld4lt> Homepage: http://aksw.org/SebastianHellmann Research Group: http://aksw.org Thesis: http://tinyurl.com/sh-thesis-summary http://tinyurl.com/sh-thesis
Received on Tuesday, 20 October 2015 11:38:26 UTC