Z39.50 Character Set Requirements from Ray Denenberg on 2001-10-30 (www-zig@w3.org from October 2001)

From: Ray Denenberg <rden@loc.gov>
Date: Tue, 30 Oct 2001 15:14:08 -0500
To: ZIG <www-zig@w3.org>
Message-ID: <3BDF0A10.D8F3E14A@rs8.loc.gov>
I would like to see a plan devised and carried out to
address Z39.50 character set issues.

My view is this: a number of people (including myself) took
considerable effort to develop character set negotiation
capability for Z39.50, and unfortunately, in its current
state it isn't much use. There are two reasons for this:
    (1) character set negotiation is confusing, and
    (2) "negotiation" simply doesn't address the range of
protocol-related character set requirements.

For now, I'd like to focus on point 2. (I think point 1 can
be more appropriately addressed once we have an
understanding of point 2, and, more generally, what problem
we're trying to solve.)

We might do well to spend some time, initially, revisiting
our requirements.  The requirements that led to character
set negotiation were fairly well-formulated, but that was
1995, and I don't know if those are today's requirements.
Three things that weren't considered, and that we either
cannot do or at least are not well-specified, even if
character set negotiation were well-specified, are:
    (a) indicate the character set of a given search term;
    (b) indicate the requested character set in a present
request;
    (c) indicate the character set of a record, in a
present  response.

Now I think (a) is most critical. It is a problem not easily
solvable (as far as I can see) using available Z39.50
technology. (In contrast, we can use the variant facility
for b and c.) To elaborate this point: there are a number of
reasons why the character set of a search term is not
reliably self-identifying, even with negotiation in effect.
I'm no character set expert but there are plenty here at LC,
and this is what they tell me. Please feel free to debate
this point.

Assuming this premise to be true,  we need a way to
explicitly indicate the character set/encoding of a search
term.  Clearly the most obvious and natural way is via an
attribute.  But what's the best way to do this? I'm sure
there will be differing views.  Do we want a sound technical
solution via the attribute architecture or a more immediate
approach via a cludge to bib-1? Or a phased approach,
involving both?

The problem with the architecture is that there is no room
for a character set attribute. We would have to issue a new
version of the architecture. (I believe we made a
shortsighted decision when we left this out; I recall we
rationalized that it wasn't needed because we had character
set negotiation.) Perhaps this problem is serious enough to
warrant a new version of the architecture.

LC has strong concern over this --  I don't mean to be
provincial --  I assume that other organizations have
similar concern. The Bath profile has been "scratching its
head" over this for quite awhile.  Those are at least two
good reasons for bumping this issue up a few notches on our
priority list.

I would like to see some focused discussion leading to a
potential solution for consideration at the next ZIG
meeting. To start, I propose the following:

    1. Amend the attribute architecture to accomodate a
character set attribute; details to be developed.

    2. Consider whether (and how) to retrofit this to bib-1.
(A new bib-1 attribute type?)

    3. Develop mechanisms/implementor-agreements/definitions
to exploit the retrieval (/variant) facility to support the
ability to indicate the requested character set in a present
request, and to indicate the character set of a record in a
present response.

   4.  Evaluate the relationship of all this to character
set negotiation. If we solve these problems, then what are
the remaining requirements of character set negotiation --
do we need to revise and/or clarify it?

Comments please. A good place to start would be "what
problem we're trying to solve", i.e., what are the
requirements.

  --Ray



--
Ray Denenberg
Library of Congress
rden@loc.gov
202-707-5795
Received on Tuesday, 30 October 2001 15:14:57 UTC