[AI] comments to byteOffset

Following my action item from 
<http://www.w3.org/2006/04/05-er-minutes#item02>, here are my comments 
to the byteOffset with snipppet.

The proposal was:

   <earl:snippet>
    <earl:Snippet>
     <earl:content rdf:parseType="Literal"
       xmlns:x="chickens"><x:a>chickens</x:a></earl:content>
     <earl:byteOffset>100</earl:ByteOffset>
    </earl:Snippet>
   </earl:snippet>

   or

   <earl:snippet>
    <earl:Snippet>
     <earl:content><![CDATA[ <div>chickens<img
       src="chicken.gif"></div> ]]></earl:content>
     <earl:byteOffset>15</earl:ByteOffset>
    </earl:Snippet>
   </earl:snippet>

   The snippet type points to a snippet and then a byteOffset to the
   start of the error within the snippet - you cannot point to a range
   with this.

As I tried to point out during the telecon, the earl:content property 
contains _characters_, while the earl:byteOffset property is a _byte_ 
offset relative to the contents of the snippet. That's inconsistent and 
mixing levels. One is in the character level, the other is on the byte 
level.

How does an EARL-reading tool know which character encoding to use to 
encode the characters in the snippet to then apply the byte offset to 
get the byte that marks the error?

When the subject is text and the snippet contains text content, it makes 
sense to have a char offset.

When the subject is binary, it makes sense to have a byte offset. The 
byte sequence to be put into the snippet has to be encoded (e.g. Base64) 
because EARL is a text format. The encoding must be recorded so the EARL 
reading tool can transform the snippet content into the original byte 
sequence to apply the byte offset.

-- 
Johannes Koch - Competence Center BIKA
Fraunhofer Institute for Applied Information Technology (FIT.LIFE)
Schloss Birlinghoven, D-53757 Sankt Augustin, Germany
Phone: +49-2241-142628

Received on Wednesday, 26 April 2006 10:04:07 UTC