- From: Lunau Carrol <carrol.lunau@nlc-bnc.ca>
- Date: Wed, 6 Mar 2002 08:52:58 -0500
- To: "'Pieter Van Lierop'" <pvanlierop@geac.fr>, "'Ray Denenberg'" <rden@loc.gov>, zig <www-zig@w3.org>
Pieter What do you mean when you say in the Bath Profile. I would be happy to incorporate it but I need somebody to write the section. Are you volunteering? Carrol -----Original Message----- From: Pieter Van Lierop [mailto:pvanlierop@geac.fr] Sent: Wednesday, March 06, 2002 4:33 AM To: 'Ray Denenberg'; zig Subject: RE: character encoding assumptions and approaches Sorry but I do not agree with your assumptions. 1. I still think that the Z39.50 protocol should not bother with the contents of anything that is not defined in the Z39.50 protocol. For example a MARC record. From the point of view of a MARC record, Z39.50 is only a transport mechanism. The MARC syntaxes have their own committees, standards, protocols, traditions, national standards, international standards: we should not bother with that. However, the solution you propose (an option bit and a Diagnostic from the server) is a very simple solution and is easy to ignore (that is good because of compatibility with the current implementations). 2. The character set agreement that we are discussing does not only imply to the search term, but to all fields defined as "International String". Is this correct or not? This means that, amongst others, the following fields are to be considered: ImplementationId, ImplementationName, ResultSetName/ResultSetId, DatabaseName, AdditionalInfo (in a diagnostic), ElementSetName, DisplayTerm (in Scan) Actually, the Term (in Search and Scan) is generally considered to be an OCTET STRING. I believe that most client applications send it as an OCTET STRING. Does that mean that when the client application sends Term as an OCTET STRING, the character set agreement does *not* apply? 3. I think that the best solution would be in the Bath profile, because they need it and they will probably be the only ones who are going to implement it. Then, I would suggest to use a field somewhere in the InitRequest indicating the name or OID of the Character Set protocol. When the server can not handle this, it returns a diagnostic and closes the connection. But as I said, I can live Ray's solution. Pieter van Lierop > -----Message d'origine----- > De : Ray Denenberg [mailto:rden@loc.gov] > Envoyé : mardi 5 mars 2002 22:38 > À : zig > Objet : character encoding assumptions and approaches > > > I have some ideas on the character set encoding > problem, but before I develop them further, or put > them out for discussion and possibly yet more > digression, I have a few questions: > > First, may I infer the following from the > discussion so far: > > 1. We agree that it's a good idea to add an option > bit allowing negotiation of utf-8, subject to > agreement about the scope of negotiation; > specifically: > 2. We want a mechanism to overide utf-8, in a > present request, or for a specific record in a > present response; however: > 3. We don't need to overide utf-8 for a search > term. (Thus we don't need to define a character > set encoding attribute, at least, not for now, and > negotiation of utf-8 will mean that all search > term are supplied in utf-8.) > > If these assumptions are correct then we've > distilled the character encoding problem down to > how to overide utf-8. > > I further assume: > (4) we agree that the implicit approach won't > work, that is, the native encoding of a format > implicitly overiding utf-8, and that we need an > explicit mechanism. > > I don't want to try to solve this by throwing oids > at the problem. I think it's shortsighted. No, > we're not going to run out of oids. But as Matthew > and others have pointed out, there are a number of > dimensions already -- base syntax, schema, > character encoding -- and don't forget format: > (i.e bibliographic, authority, holdings, community > information, classification -- see > http://lcweb.loc.gov/z3950/agency/defns/oids.html#format). > It wouldn't take long to have an unmamageable oid > tree. > > And furthermore, the abstractions we've developed > for Z39.50 are it strength and we should exploit > them. Perhaps we did a good job of developing > abstractions and not so good a job of engineering > them into the protocol, at least not from > contemporary perspective. Perhaps it's not > out-of-the question to consider some > reverse-engineering, rather than throwing out the > model. > > Now, the straightforward Z39.50 approach would > use: > (a) compspec, espec, and variant on the request, > and > (b) grs-1 (with embedded variant) on the response. > > and the sentiment is that this is overkill for > what we're narrowly focusing on now, which is > simply the ability to specify an encoding for a > marc record. > > I think we can come up with a solution for (a), > the request part. I think (b), the response part, > is harder. > > My question, at this point, is: is it (a) that > people resist, and are we willing to put marc > records in grs-1? Z39.50 is still an asn.1 > protocol, don't forget. So it isn't as though > you're going to avoid asn.1 by sending straight > marc rather than marc wrapped in grs-1. > > But assuming you don't want to do grs-1, is this > a reasonable alternative: assume we come up with > a solution for the request. The records would be > supplied in the native record syntax (marc21, > ukmarc, etc.) encoded as requested; if the server > cannot supply records in the requested encoding it > fails the request or supplies surrogate > diagnostics. > > Please give this some thought. > > --Ray >
Received on Wednesday, 6 March 2002 08:52:53 UTC