- From: Pieter Van Lierop <pvanlierop@geac.fr>
- Date: Wed, 6 Mar 2002 14:57:31 +0100
- To: "'Lunau Carrol'" <carrol.lunau@nlc-bnc.ca>
- Cc: zig <www-zig@w3.org>, "'Ray Denenberg'" <rden@loc.gov>
Carrol, Yes I could write a proposal for this. But let us first wait for the discussion on the list. Ray had a strong opinion of *not* solving this via a profile but through general Z39.50 solution. Pieter > -----Message d'origine----- > De : Lunau Carrol [mailto:carrol.lunau@nlc-bnc.ca] > Envoyé : mercredi 6 mars 2002 14:53 > À : 'Pieter Van Lierop'; 'Ray Denenberg'; zig > Objet : RE: character encoding assumptions and approaches > > > Pieter > What do you mean when you say in the Bath Profile. I would be happy to > incorporate it but I need somebody to write the section. Are you > volunteering? Carrol > > -----Original Message----- > From: Pieter Van Lierop [mailto:pvanlierop@geac.fr] > Sent: Wednesday, March 06, 2002 4:33 AM > To: 'Ray Denenberg'; zig > Subject: RE: character encoding assumptions and approaches > > > Sorry but I do not agree with your assumptions. > > 1. I still think that the Z39.50 protocol should not bother with the > contents of anything that is not defined in the Z39.50 > protocol. For example > a MARC record. From the point of view of a MARC record, > Z39.50 is only a > transport mechanism. The MARC syntaxes have their own > committees, standards, > protocols, traditions, national standards, international standards: we > should not bother with that. > However, the solution you propose (an option bit and a > Diagnostic from the > server) is a very simple solution and is easy to ignore (that is good > because of compatibility with the current implementations). > > 2. The character set agreement that we are discussing does > not only imply to > the search term, but to all fields defined as "International > String". Is > this correct or not? > This means that, amongst others, the following fields are to > be considered: > ImplementationId, ImplementationName, ResultSetName/ResultSetId, > DatabaseName, AdditionalInfo (in a diagnostic), > ElementSetName, DisplayTerm > (in Scan) > Actually, the Term (in Search and Scan) is generally > considered to be an > OCTET STRING. I believe that most client applications send it > as an OCTET > STRING. > Does that mean that when the client application sends Term as an OCTET > STRING, the character set agreement does *not* apply? > > 3. I think that the best solution would be in the Bath > profile, because they > need it and they will probably be the only ones who are going > to implement > it. > Then, I would suggest to use a field somewhere in the > InitRequest indicating > the name or OID of the Character Set protocol. When the server can not > handle this, it returns a diagnostic and closes the connection. > But as I said, I can live Ray's solution. > > Pieter van Lierop > > > -----Message d'origine----- > > De : Ray Denenberg [mailto:rden@loc.gov] > > Envoyé : mardi 5 mars 2002 22:38 > > À : zig > > Objet : character encoding assumptions and approaches > > > > > > I have some ideas on the character set encoding > > problem, but before I develop them further, or put > > them out for discussion and possibly yet more > > digression, I have a few questions: > > > > First, may I infer the following from the > > discussion so far: > > > > 1. We agree that it's a good idea to add an option > > bit allowing negotiation of utf-8, subject to > > agreement about the scope of negotiation; > > specifically: > > 2. We want a mechanism to overide utf-8, in a > > present request, or for a specific record in a > > present response; however: > > 3. We don't need to overide utf-8 for a search > > term. (Thus we don't need to define a character > > set encoding attribute, at least, not for now, and > > negotiation of utf-8 will mean that all search > > term are supplied in utf-8.) > > > > If these assumptions are correct then we've > > distilled the character encoding problem down to > > how to overide utf-8. > > > > I further assume: > > (4) we agree that the implicit approach won't > > work, that is, the native encoding of a format > > implicitly overiding utf-8, and that we need an > > explicit mechanism. > > > > I don't want to try to solve this by throwing oids > > at the problem. I think it's shortsighted. No, > > we're not going to run out of oids. But as Matthew > > and others have pointed out, there are a number of > > dimensions already -- base syntax, schema, > > character encoding -- and don't forget format: > > (i.e bibliographic, authority, holdings, community > > information, classification -- see > > http://lcweb.loc.gov/z3950/agency/defns/oids.html#format). > > It wouldn't take long to have an unmamageable oid > > tree. > > > > And furthermore, the abstractions we've developed > > for Z39.50 are it strength and we should exploit > > them. Perhaps we did a good job of developing > > abstractions and not so good a job of engineering > > them into the protocol, at least not from > > contemporary perspective. Perhaps it's not > > out-of-the question to consider some > > reverse-engineering, rather than throwing out the > > model. > > > > Now, the straightforward Z39.50 approach would > > use: > > (a) compspec, espec, and variant on the request, > > and > > (b) grs-1 (with embedded variant) on the response. > > > > and the sentiment is that this is overkill for > > what we're narrowly focusing on now, which is > > simply the ability to specify an encoding for a > > marc record. > > > > I think we can come up with a solution for (a), > > the request part. I think (b), the response part, > > is harder. > > > > My question, at this point, is: is it (a) that > > people resist, and are we willing to put marc > > records in grs-1? Z39.50 is still an asn.1 > > protocol, don't forget. So it isn't as though > > you're going to avoid asn.1 by sending straight > > marc rather than marc wrapped in grs-1. > > > > But assuming you don't want to do grs-1, is this > > a reasonable alternative: assume we come up with > > a solution for the request. The records would be > > supplied in the native record syntax (marc21, > > ukmarc, etc.) encoded as requested; if the server > > cannot supply records in the requested encoding it > > fails the request or supplies surrogate > > diagnostics. > > > > Please give this some thought. > > > > --Ray > > >
Received on Wednesday, 6 March 2002 08:58:50 UTC