- From: Shadi Abou-Zahra <shadi@w3.org>
- Date: Wed, 03 May 2006 19:02:43 +0200
- To: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Cc: "public-wai-ert@w3.org" <public-wai-ert@w3.org>
Hi Johannes, Thanks for taking this discussion to the mailing list for more reflection. I believe we are on the same page but just for the sake of completeness, here a possible issue: Say CE1 is UTF-16, and CE2 ASCII. You translate double-byte UTF-16 characters into single-byte ASCII characters and count the byteOffset correctly to publish a clean and valid report. However, the reader/processor may need to know that the textContent was originally UTF-16 in order to decode the "UTF-16 in ASCII" and reassemble the original content (for example to display it to the end-user). Or am I overseeing how you want to translate UTF-16 characters into ASCII ones without going into the byte-level? Regards, Shadi Johannes Koch wrote: > > Hi group, > > I didn't want to be rude, but I really could not see a problem with the > textContent property. So I try to clarify my opinion. > > 1. I make a request for a resource I want to check. > 2. I get a response containing a sequence of bytes and, if it is a text > resource, hopefully a character encoding (CE) via some metadata > (Content-Type header in HTTP). Otherwise I use a default CE. > 3. I use the CE1 (specified or default) to transform the sequence of > bytes into a sequence of characters. From now on, I'm on the character > level, no bytes around anymore. > 4. I extract a text snippet from the resource characters. > 5. I create an EARL report containing the snippet. I'm still on the > character level. > 6. I want to store the EARL report on the file system or send them over > the network. Therefore I have to transform the EARL report characters > into a sequence of bytes using a character encoding CE2. CE2 is not > required to be the same as CE1. However, CE2 should contain mappings for > all characters in the EARL report. > > Of course, there is a step #0 prior to #1: > 0: An author creates a text document by writing characters, then storing > the document on the web server file system by transforming the > characters into a sequence of bytes using a character encoding CE0. > > It may also happen that the document is created by merging different > sources on the byte level instead of the character level. So there's a > problem when source1 uses a different CE than source2. Transforming the > merged bytes into a sequence of characters will not give the proper > result. But this is a problem with resource document creation, not with > EARL report creation. -- Shadi Abou-Zahra Web Accessibility Specialist for Europe | Chair & Staff Contact for the Evaluation and Repair Tools WG | World Wide Web Consortium (W3C) http://www.w3.org/ | Web Accessibility Initiative (WAI), http://www.w3.org/WAI/ | WAI-TIES Project, http://www.w3.org/WAI/TIES/ | Evaluation and Repair Tools WG, http://www.w3.org/WAI/ER/ | 2004, Route des Lucioles - 06560, Sophia-Antipolis - France | Voice: +33(0)4 92 38 50 64 Fax: +33(0)4 92 38 78 22 |
Received on Wednesday, 3 May 2006 17:02:56 UTC