RE: Bytes, character encodings and characters

 

Hi,

> the reader/processor may need to know that the 
> textContent was originally UTF-16 in order to decode the 
> "UTF-16 in ASCII" and reassemble the original content (for 
> example to display it to the end-user). Or am I overseeing 
> how you want to translate UTF-16 characters into ASCII ones 
> without going into the byte-level?

Now I see your point Shadi, and I think you're right.

Regards,

CI.

 
> Johannes Koch wrote:
> > 
> > Hi group,
> > 
> > I didn't want to be rude, but I really could not see a problem with 
> > the textContent property. So I try to clarify my opinion.
> > 
> > 1. I make a request for a resource I want to check.
> > 2. I get a response containing a sequence of bytes and, if it is a 
> > text resource, hopefully a character encoding (CE) via some 
> metadata 
> > (Content-Type header in HTTP). Otherwise I use a default CE.
> > 3. I use the CE1 (specified or default) to transform the 
> sequence of 
> > bytes into a sequence of characters. From now on, I'm on 
> the character 
> > level, no bytes around anymore.
> > 4. I extract a text snippet from the resource characters.
> > 5. I create an EARL report containing the snippet. I'm still on the 
> > character level.
> > 6. I want to store the EARL report on the file system or send them 
> > over the network. Therefore I have to transform the EARL report 
> > characters into a sequence of bytes using a character encoding CE2. 
> > CE2 is not required to be the same as CE1. However, CE2 
> should contain 
> > mappings for all characters in the EARL report.
> > 
> > Of course, there is a step #0 prior to #1:
> > 0: An author creates a text document by writing characters, then 
> > storing the document on the web server file system by 
> transforming the 
> > characters into a sequence of bytes using a character encoding CE0.
> > 
> > It may also happen that the document is created by merging 
> different 
> > sources on the byte level instead of the character level. 
> So there's a 
> > problem when source1 uses a different CE than source2. Transforming 
> > the merged bytes into a sequence of characters will not give the 
> > proper result. But this is a problem with resource document 
> creation, 
> > not with EARL report creation.
> 
> -- 
> Shadi Abou-Zahra     Web Accessibility Specialist for Europe | 
> Chair & Staff Contact for the Evaluation and Repair Tools WG | 
> World Wide Web Consortium (W3C)           http://www.w3.org/ | 
> Web Accessibility Initiative (WAI),   http://www.w3.org/WAI/ | 
> WAI-TIES Project,                http://www.w3.org/WAI/TIES/ | 
> Evaluation and Repair Tools WG,    http://www.w3.org/WAI/ER/ | 
> 2004, Route des Lucioles - 06560,  Sophia-Antipolis - France | 
> Voice: +33(0)4 92 38 50 64          Fax: +33(0)4 92 38 78 22 | 
> 
> 

Received on Tuesday, 9 May 2006 10:56:44 UTC