On Monday, June 18, 2001 at 1:39 AM, link@tss.no (Terje Bless) wrote: > Right. We can't guarantee a lossless roundtrip to ISO 10646[0] so every > offset -- be it byte or character -- would need to be in terms of the UTF-8 > encoded ISO 10646 version of the file. How would that affect your Error > Browser Chris? Dealing with byte offsets would be nearly impossible. as long as I get back a character offset everything should be fine. > Could we perhaps convert to Normalization Form C[2] and report Unicode > character offsets (or even bytes if it's easier) from beginning of file? I don't see how the character offset is going to be any different between UTF-8 and UTF-16. Byte based offsets would be different but character offsets should be the same. -- Christian Smith | csmith@barebones.com | http://web.barebones.com He who dies with the most friends... Is still dead!Received on Monday, 18 June 2001 10:38:55 GMT
This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 7 December 2009 10:57:02 GMT