Z39.50 character encoding

I posted a message last October about  character
sets; see:
http://lists.w3.org/Archives/Public/www-zig/2001Oct/0041.html

There was some response, not much though, and so I
have engaged those who responded or appeared
interested in this issue in a private discussion
outside of the list. We now have a proposal for
discussion.

We propose adoption of one of the following three
approaches:

(a) Assign an option bit for utf-8 encoding.
(b) Define an attribute for the encoding of a
search term.
(c) Do both.

Option bit
If this bit is negotiated it would pertain to
retrieved data as well as the search term.
Additional option bits would be defined as needed,
however with the premise that there would never be
more than just a few (no bits for non-unicode
encodings, for example). If an additional bit is
defined, say utf-16, then only one could be
negotiated for a given association. (The client
could propose more than one and the server
responds with only one.) If a particular encoding
is negotiated, it is the presumed encoding for the
association, unless overiden (for example, by a
variant, or, in the case of approach (c), an
attribute). If no encoding is negotiated then
behavior is unspecified; that is, current behavior
(whatever that is) is in force.

Attribute
 Define an encoding attribute, initially with one
value, "utf-8", additional values added as needed.
There would be no default; recommended  practice
would be to always include the attribute, but if
it is omitted, behavior is unspecified.

Both
Option bit and attribute. If the option bit is
negotiated, then the attribute may overide it. (So
in this case the semantics of omission would be
slightly different, but recommended practice would
still be to always include it.) Thus the option
bit  determines the presumed encoding (for an
association) in the absence of other, explicit
encoding instructions (i.e., the attribute for
search terms, and variants for retrieval data).

The case for approach (c) is that either (a) or
(b) alone has limitations. An attribute won't
cover retrieved data, just the search term. The
option bit alone won't let you overide it for a
search term.  (You could overide it in retrieval
data, using variants. Why not be able to overide
it for a search term?)

 If you're wondering how this proposal fits with
the existing character set negotiation definition
--  I think the obvious answer (and I say this
with both regret and relief) is that this is
intended to supercede that definition if approach
(b) or (c) is adopted.

Please comment on this proposal.  Is there one of
these three approaches you like, or is there
another approach altogether that you favor?


--Ray

Received on Thursday, 21 February 2002 14:06:49 UTC