- From: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Date: Tue, 09 May 2006 23:34:13 +0200
- To: public-wai-ert@w3.org
Sorry, this one went only to Shadi. -------- Original Message -------- Subject: Re: Bytes, character encodings and characters Date: Wed, 03 May 2006 22:49:07 +0200 From: Johannes Koch <johannes.koch@fit.fraunhofer.de> To: Shadi Abou-Zahra <shadi@w3.org> References: <4458D046.1040003@fit.fraunhofer.de> <4458E233.6010105@w3.org> Shadi Abou-Zahra wrote: > I believe we are on the same page but just for the sake of > completeness, here a possible issue: > > Say CE1 is UTF-16, and CE2 ASCII. This is only possible if the EARL only contains characters in the Unicode range up to U007F, because US-ASCII is limited to these. > You translate double-byte UTF-16 > characters into single-byte ASCII characters I translate characters (no matter where they came from) into a byte sequence using US-ASCII ... > and count the byteOffset > correctly to publish a clean and valid report. If you want to use earl:byteOffset together with earl:textContent an EARL reading tool will need the character encoding that you used to create the byte sequence which forms the base for your counting. But, as I said last week, this mixing of levels doesn't make sense to me. Use charOffset together with textContent. Use byteOffset together with base64Content. If there is a byte sequence in the resource that cannot be transformed into a proper character sequence using the chosen character encoding, use base64Content for the snippet. Which character sequence would you like to put into a textContent snippet? -- Johannes Koch - Competence Center BIKA Fraunhofer Institute for Applied Information Technology (FIT.LIFE) Schloss Birlinghoven, D-53757 Sankt Augustin, Germany Phone: +49-2241-142628 -- Johannes Koch - Competence Center BIKA Fraunhofer Institute for Applied Information Technology (FIT.LIFE) Schloss Birlinghoven, D-53757 Sankt Augustin, Germany Phone: +49-2241-142628
Received on Tuesday, 9 May 2006 21:34:55 UTC