- From: Johannes Koch <johannes.koch@fit.fraunhofer.de>
- Date: Wed, 26 Apr 2006 09:54:05 +0200
- To: "public-wai-ert@w3.org" <public-wai-ert@w3.org>
Following my action item from
<http://www.w3.org/2006/04/05-er-minutes#item02>, here are my comments
to the byteOffset with snipppet.
The proposal was:
<earl:snippet>
<earl:Snippet>
<earl:content rdf:parseType="Literal"
xmlns:x="chickens"><x:a>chickens</x:a></earl:content>
<earl:byteOffset>100</earl:ByteOffset>
</earl:Snippet>
</earl:snippet>
or
<earl:snippet>
<earl:Snippet>
<earl:content><![CDATA[ <div>chickens<img
src="chicken.gif"></div> ]]></earl:content>
<earl:byteOffset>15</earl:ByteOffset>
</earl:Snippet>
</earl:snippet>
The snippet type points to a snippet and then a byteOffset to the
start of the error within the snippet - you cannot point to a range
with this.
As I tried to point out during the telecon, the earl:content property
contains _characters_, while the earl:byteOffset property is a _byte_
offset relative to the contents of the snippet. That's inconsistent and
mixing levels. One is in the character level, the other is on the byte
level.
How does an EARL-reading tool know which character encoding to use to
encode the characters in the snippet to then apply the byte offset to
get the byte that marks the error?
When the subject is text and the snippet contains text content, it makes
sense to have a char offset.
When the subject is binary, it makes sense to have a byte offset. The
byte sequence to be put into the snippet has to be encoded (e.g. Base64)
because EARL is a text format. The encoding must be recorded so the EARL
reading tool can transform the snippet content into the original byte
sequence to apply the byte offset.
--
Johannes Koch - Competence Center BIKA
Fraunhofer Institute for Applied Information Technology (FIT.LIFE)
Schloss Birlinghoven, D-53757 Sankt Augustin, Germany
Phone: +49-2241-142628
Received on Wednesday, 26 April 2006 10:04:07 UTC