EXPath binary module - comments - bin:decode-string() from John Lumley on 2013-08-05 (public-expath@w3.org from August 2013)

From: John Lumley <john@saxonica.com>
Date: Mon, 05 Aug 2013 14:30:03 +0100
To: EXPath ML <public-expath@w3.org>
Message-ID: <51FFA8DB.3080100@saxonica.com>

There is an outstanding issue about handling decoding errors when 
decoding strings which will need some addressing.  Such errors can occur 
under the following circumstances:

 1. The encoding is known but defined incorrectly (e.g. using UTF-8 when
    UTF-16 was used to encode)
 2. The length to decode wasn't 'complete', i.e. some hanging
    multi-octet characters were incomplete
 3. There was a phasing error at the start, i.e. the start point was not
    at a code-point boundary.

We must assume that the decoding error can be detected of course. The 
question then is what should be done, and whether any form of recovery 
should be supported.

The simplest of course is to thow an error (which try/catch can field) - 
but do we want to try and tell what the error is? In some cases the 
'replacement character' can be substituted - this is especially true 
with self-synchonising encodings such as UTF-8. But even then do we want 
to signal the error, and if so, to where does the 'decode with 
replacement character' string get returned? (In XSLT 3.0 we could build 
a reporting structure that was bound to the $err:value variable... 
XSLT-2.0 of course doesn't have a try)

Others in this community will have far more experience of this issue 
than I, so I'd welcome your thoughts. Decoding error management does 
need to be defined for this function

-- 
*John Lumley* MA PhD CEng FIEE
john@saxonica.com <mailto:john@saxonica.com>
on behalf of Saxonica Ltd

Received on Monday, 5 August 2013 13:30:25 UTC