- From: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Date: Wed, 10 May 2006 09:38:28 +0200
- To: public-wai-ert@w3.org
Shadi Abou-Zahra wrote:
>
> Just to have a practical example:
>
> Say we have a string of 2 characters that are above U007F (but lets
> still represent the string as "AB"). We want to store this string as a
> snippet and point to the second character ("B") as the start of an
> error. Let's say we start counting characters at 0, and so it would have
> charOffset of 1.
>
> Now we serialize this string as ASCII (despite its bad
> internationalization support) and get something like "&U0080;&U0081;"
> (use your imagination).
€ :-)
> Would charOffset remain 1 (resolve character
> references first, then count) or change to 7 (count in actual ASCII
> characters)?
It's still charOffset 1. The EARL-reading XML-aware software will read
€ and then create the two characters U0080 and U0081.
> I'm assuming the first approach (resolve then count) but we need to
> agree on this and document it for others.
OK
A different case:
The text snippet is 'fooÖbar'.
The XML for this is
<earl:textSnippet><![CDATA[fooÖbar]]></earl:textSnippet>
or
<earl:textSnippet>foo&Ouml;bar</earl:textSnippet>
We want to point to the 'b'. Now the charOffset is 9 because the
resolved characters preceding the b are 'fooÖ', not 'fooÖ'.
> Also the fact that we start
> counting strings at 0 (or 1 if people prefer).
Yep, that's important. As well for line and column numbers.
--
Johannes Koch - Competence Center BIKA
Fraunhofer Institute for Applied Information Technology (FIT.LIFE)
Schloss Birlinghoven, D-53757 Sankt Augustin, Germany
Phone: +49-2241-142628
Received on Wednesday, 10 May 2006 07:39:15 UTC