- From: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Date: Tue, 09 May 2006 23:35:48 +0200
- To: public-wai-ert@w3.org
-------- Original Message -------- Subject: Re: Bytes, character encodings and characters Date: Thu, 04 May 2006 09:34:39 +0200 From: Johannes Koch <johannes.koch@fit.fraunhofer.de> To: Shadi Abou-Zahra <shadi@w3.org> References: <4458D046.1040003@fit.fraunhofer.de> <4458E233.6010105@w3.org> <44591743.5070901@fit.fraunhofer.de> <44591F97.7020709@w3.org> Shadi Abou-Zahra wrote: > Johannes Koch wrote: > >>> and count the byteOffset correctly to publish a clean and valid report. >> >> If you want to use earl:byteOffset together with earl:textContent an >> EARL reading tool will need the character encoding that you used to >> create the byte sequence which forms the base for your counting. > > Typo, I *did* mean charOffset! Ah, ok. >> If there is a byte sequence in the resource that cannot be transformed >> into a proper character sequence using the chosen character encoding, >> use base64Content for the snippet. Which character sequence would you >> like to put into a textContent snippet? > > So this means we need to say something along the lines of "if the > original encoding in the Web content can not be represented in the > encoding of the EARL report, then base64Content needs to be used". To > use the same example, because you could not display UTF-16 in ASCII, you > should record the snippet in base64. Correct? 1. The problem is a problem with the resource. The resource's bytes cannot be transformed into characters properly with the chosen character encoding. That was a use case Nick mentioned last week, I think. If you want to record this error (improper byte sequence for character encoding xxxxx), you will need the base64Content with byteOffset. You cannot create a textContent with charOffset because you cannot transform the bytes into characters. At least not the problematic ones. Of course you could create a textContent with the characters up to the problematic point. But then you could not create a charOffset pointing to a character position in the textContent, because the problematic point is not in the textContent. 2. You could encode an EARL report with whatever character encoding you want. But when using US-ASCII you would need character references for all characters above U007F. -- Johannes Koch - Competence Center BIKA Fraunhofer Institute for Applied Information Technology (FIT.LIFE) Schloss Birlinghoven, D-53757 Sankt Augustin, Germany Phone: +49-2241-142628 -- Johannes Koch - Competence Center BIKA Fraunhofer Institute for Applied Information Technology (FIT.LIFE) Schloss Birlinghoven, D-53757 Sankt Augustin, Germany Phone: +49-2241-142628
Received on Tuesday, 9 May 2006 21:36:42 UTC