- From: Terje Bless <link@tss.no>
- Date: Mon, 18 Jun 2001 18:20:31 +0200
- To: Christian Smith <csmith@barebones.com>
- cc: Martin Duerst <duerst@w3.org>, Nick Kew <nick@webthing.com>, Esmond Walshe <esmond.walshe@eeng.dcu.ie>, www-validator@w3.org
On 18.06.01 at 11:56, Christian Smith <csmith@barebones.com> wrote: >Character [offset] X of the document should be the same regardless of >whether the file is in 7bit ascii or encoded as Unicode. Unless I'm missing something, yes, but with the caveat below. >>some combination symbols can be expressed either as their own unique code >>point, or as a set of equivalent combination characters. > >And would this cause you to report a different character offset or would >you report the same character offset regardless? There may be something in the UNICODE or ISO spec that specifies how to count characters when faced with this problem, but I'm not familiar enough with it to know either way (Martin?). Our offset would be sum of the length of all previous lines, plus the character offset on the current line that SP reports. Both the line lengths and the offset in the current line would be dependant on how our chosen implementation does the counts. >In either case I'm not overly concerned at this point. The worst that >happens is the insertion point my not be set quite right in some cases. >C'est le vie. Given that all the common accented characters from German ("צ"), Spanish, French (יטב etc.), and the Scandinavian languages (זרוצה etc.) can potentially be one or two characters, the probability of a significant error increases proportionally with the lingth of the text. You could end up several hundred characters off in even in medium sized documents with NFD. Given use of NFC, the error should probably be within a few characters and always less then the "real" offset. Anyways, you may be right that the problem is academic at this point. We can probably figure out a solution once we get to a point that we actually report these offsets somewhere. :-)
Received on Monday, 18 June 2001 12:21:28 UTC