- From: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Date: Wed, 10 May 2006 09:38:28 +0200
- To: public-wai-ert@w3.org
Shadi Abou-Zahra wrote: > > Just to have a practical example: > > Say we have a string of 2 characters that are above U007F (but lets > still represent the string as "AB"). We want to store this string as a > snippet and point to the second character ("B") as the start of an > error. Let's say we start counting characters at 0, and so it would have > charOffset of 1. > > Now we serialize this string as ASCII (despite its bad > internationalization support) and get something like "&U0080;&U0081;" > (use your imagination). € :-) > Would charOffset remain 1 (resolve character > references first, then count) or change to 7 (count in actual ASCII > characters)? It's still charOffset 1. The EARL-reading XML-aware software will read € and then create the two characters U0080 and U0081. > I'm assuming the first approach (resolve then count) but we need to > agree on this and document it for others. OK A different case: The text snippet is 'fooÖbar'. The XML for this is <earl:textSnippet><![CDATA[fooÖbar]]></earl:textSnippet> or <earl:textSnippet>foo&Ouml;bar</earl:textSnippet> We want to point to the 'b'. Now the charOffset is 9 because the resolved characters preceding the b are 'fooÖ', not 'fooÖ'. > Also the fact that we start > counting strings at 0 (or 1 if people prefer). Yep, that's important. As well for line and column numbers. -- Johannes Koch - Competence Center BIKA Fraunhofer Institute for Applied Information Technology (FIT.LIFE) Schloss Birlinghoven, D-53757 Sankt Augustin, Germany Phone: +49-2241-142628
Received on Wednesday, 10 May 2006 07:39:15 UTC