Re: Bytes, character encodings and characters

Just to have a practical example:

Say we have a string of 2 characters that are above U007F (but lets still represent the string as "AB"). We want to store this string as a snippet and point to the second character ("B") as the start of an error. Let's say we start counting characters at 0, and so it would have charOffset of 1.

Now we serialize this string as ASCII (despite its bad internationalization support) and get something like "&U0080;&U0081;" (use your imagination). Would charOffset remain 1 (resolve character references first, then count) or change to 7 (count in actual ASCII characters)?

I'm assuming the first approach (resolve then count) but we need to agree on this and document it for others. Also the fact that we start counting strings at 0 (or 1 if people prefer).

Regards,
  Shadi


Johannes Koch wrote:
> 
> Shadi wrote:
>>> the reader/processor may need to know that the textContent was 
>>> originally UTF-16 in order to decode the "UTF-16 in ASCII" and 
>>> reassemble the original content (for example to display it to the 
>>> end-user). Or am I overseeing how you want to translate UTF-16 
>>> characters into ASCII ones without going into the byte-level?
> 
> Carlos Iglesias wrote:
>> Now I see your point Shadi, and I think you're right.
> 
> I don't. When you have transformed the resource bytes into characters 
> using UTF-16, you can create an EARL report and store it using US-ASCII 
> by writing character references for characters above U007F. A reader can 
> then again transform the bytes into characters using US-ASCII. No 
> problem. However US-ASCII is not the first choice when it comes to 
> internationalization :-)

-- 
Shadi Abou-Zahra     Web Accessibility Specialist for Europe | 
Chair & Staff Contact for the Evaluation and Repair Tools WG | 
World Wide Web Consortium (W3C)           http://www.w3.org/ | 
Web Accessibility Initiative (WAI),   http://www.w3.org/WAI/ | 
WAI-TIES Project,                http://www.w3.org/WAI/TIES/ | 
Evaluation and Repair Tools WG,    http://www.w3.org/WAI/ER/ | 
2004, Route des Lucioles - 06560,  Sophia-Antipolis - France | 
Voice: +33(0)4 92 38 50 64          Fax: +33(0)4 92 38 78 22 | 

Received on Tuesday, 9 May 2006 22:58:23 UTC