- From: Ray Denenberg <rden@loc.gov>
- Date: Tue, 30 Oct 2001 15:14:08 -0500
- To: ZIG <www-zig@w3.org>
I would like to see a plan devised and carried out to address Z39.50 character set issues. My view is this: a number of people (including myself) took considerable effort to develop character set negotiation capability for Z39.50, and unfortunately, in its current state it isn't much use. There are two reasons for this: (1) character set negotiation is confusing, and (2) "negotiation" simply doesn't address the range of protocol-related character set requirements. For now, I'd like to focus on point 2. (I think point 1 can be more appropriately addressed once we have an understanding of point 2, and, more generally, what problem we're trying to solve.) We might do well to spend some time, initially, revisiting our requirements. The requirements that led to character set negotiation were fairly well-formulated, but that was 1995, and I don't know if those are today's requirements. Three things that weren't considered, and that we either cannot do or at least are not well-specified, even if character set negotiation were well-specified, are: (a) indicate the character set of a given search term; (b) indicate the requested character set in a present request; (c) indicate the character set of a record, in a present response. Now I think (a) is most critical. It is a problem not easily solvable (as far as I can see) using available Z39.50 technology. (In contrast, we can use the variant facility for b and c.) To elaborate this point: there are a number of reasons why the character set of a search term is not reliably self-identifying, even with negotiation in effect. I'm no character set expert but there are plenty here at LC, and this is what they tell me. Please feel free to debate this point. Assuming this premise to be true, we need a way to explicitly indicate the character set/encoding of a search term. Clearly the most obvious and natural way is via an attribute. But what's the best way to do this? I'm sure there will be differing views. Do we want a sound technical solution via the attribute architecture or a more immediate approach via a cludge to bib-1? Or a phased approach, involving both? The problem with the architecture is that there is no room for a character set attribute. We would have to issue a new version of the architecture. (I believe we made a shortsighted decision when we left this out; I recall we rationalized that it wasn't needed because we had character set negotiation.) Perhaps this problem is serious enough to warrant a new version of the architecture. LC has strong concern over this -- I don't mean to be provincial -- I assume that other organizations have similar concern. The Bath profile has been "scratching its head" over this for quite awhile. Those are at least two good reasons for bumping this issue up a few notches on our priority list. I would like to see some focused discussion leading to a potential solution for consideration at the next ZIG meeting. To start, I propose the following: 1. Amend the attribute architecture to accomodate a character set attribute; details to be developed. 2. Consider whether (and how) to retrofit this to bib-1. (A new bib-1 attribute type?) 3. Develop mechanisms/implementor-agreements/definitions to exploit the retrieval (/variant) facility to support the ability to indicate the requested character set in a present request, and to indicate the character set of a record in a present response. 4. Evaluate the relationship of all this to character set negotiation. If we solve these problems, then what are the remaining requirements of character set negotiation -- do we need to revise and/or clarify it? Comments please. A good place to start would be "what problem we're trying to solve", i.e., what are the requirements. --Ray -- Ray Denenberg Library of Congress rden@loc.gov 202-707-5795
Received on Tuesday, 30 October 2001 15:14:57 UTC