- From: Ray Denenberg <rden@loc.gov>
- Date: Tue, 05 Mar 2002 16:37:40 -0500
- To: zig <www-zig@w3.org>
I have some ideas on the character set encoding problem, but before I develop them further, or put them out for discussion and possibly yet more digression, I have a few questions: First, may I infer the following from the discussion so far: 1. We agree that it's a good idea to add an option bit allowing negotiation of utf-8, subject to agreement about the scope of negotiation; specifically: 2. We want a mechanism to overide utf-8, in a present request, or for a specific record in a present response; however: 3. We don't need to overide utf-8 for a search term. (Thus we don't need to define a character set encoding attribute, at least, not for now, and negotiation of utf-8 will mean that all search term are supplied in utf-8.) If these assumptions are correct then we've distilled the character encoding problem down to how to overide utf-8. I further assume: (4) we agree that the implicit approach won't work, that is, the native encoding of a format implicitly overiding utf-8, and that we need an explicit mechanism. I don't want to try to solve this by throwing oids at the problem. I think it's shortsighted. No, we're not going to run out of oids. But as Matthew and others have pointed out, there are a number of dimensions already -- base syntax, schema, character encoding -- and don't forget format: (i.e bibliographic, authority, holdings, community information, classification -- see http://lcweb.loc.gov/z3950/agency/defns/oids.html#format). It wouldn't take long to have an unmamageable oid tree. And furthermore, the abstractions we've developed for Z39.50 are it strength and we should exploit them. Perhaps we did a good job of developing abstractions and not so good a job of engineering them into the protocol, at least not from contemporary perspective. Perhaps it's not out-of-the question to consider some reverse-engineering, rather than throwing out the model. Now, the straightforward Z39.50 approach would use: (a) compspec, espec, and variant on the request, and (b) grs-1 (with embedded variant) on the response. and the sentiment is that this is overkill for what we're narrowly focusing on now, which is simply the ability to specify an encoding for a marc record. I think we can come up with a solution for (a), the request part. I think (b), the response part, is harder. My question, at this point, is: is it (a) that people resist, and are we willing to put marc records in grs-1? Z39.50 is still an asn.1 protocol, don't forget. So it isn't as though you're going to avoid asn.1 by sending straight marc rather than marc wrapped in grs-1. But assuming you don't want to do grs-1, is this a reasonable alternative: assume we come up with a solution for the request. The records would be supplied in the native record syntax (marc21, ukmarc, etc.) encoded as requested; if the server cannot supply records in the requested encoding it fails the request or supplies surrogate diagnostics. Please give this some thought. --Ray
Received on Tuesday, 5 March 2002 16:36:28 UTC