RE: Z39.50 character encoding

I went back to Ray's mail to see what the problem is.
What is the problem?
The problem is that Z39.50 implementations do not use the Z39.50 Character
Set Negotiation. They have been not doing that for many years, so why
suddenly this becomes a problem?
So Ray maybe you could tell us which problem we are trying to solve, and
why, and who wants it to be solved?
 
To be sure: I think this is a very important subject but we are too late to
change it in Z39.50 Version 3. Maybe it will be better in ZING. Or in the
Bath profile: The Bath profile seems to me the only thing in the Z39.50 that
is able to force people to new investments in Z39.50.
 
Pieter

-----Message d'origine-----
De : Ray Denenberg [mailto:rden@loc.gov]
Envoyé : jeudi 21 février 2002 20:08
À : zig
Objet : Z39.50 character encoding


I posted a message last October about  character sets; see:
http://lists.w3.org/Archives/Public/www-zig/2001Oct/0041.html
<http://lists.w3.org/Archives/Public/www-zig/2001Oct/0041.html>  

There was some response, not much though, and so I have engaged those who
responded or appeared interested in this issue in a private discussion
outside of the list. We now have a proposal for discussion. 


We propose adoption of one of the following three approaches: 


(a) Assign an option bit for utf-8 encoding. 
(b) Define an attribute for the encoding of a search term. 
(c) Do both. 


Option bit 
If this bit is negotiated it would pertain to retrieved data as well as the
search term. Additional option bits would be defined as needed, however with
the premise that there would never be more than just a few (no bits for
non-unicode encodings, for example). If an additional bit is defined, say
utf-16, then only one could be negotiated for a given association. (The
client could propose more than one and the server responds with only one.)
If a particular encoding is negotiated, it is the presumed encoding for the
association, unless overiden (for example, by a variant, or, in the case of
approach (c), an attribute). If no encoding is negotiated then behavior is
unspecified; that is, current behavior (whatever that is) is in force. 


Attribute 
 Define an encoding attribute, initially with one value, "utf-8", additional
values added as needed. There would be no default; recommended  practice
would be to always include the attribute, but if it is omitted, behavior is
unspecified. 


Both 
Option bit and attribute. If the option bit is negotiated, then the
attribute may overide it. (So in this case the semantics of omission would
be slightly different, but recommended practice would still be to always
include it.) Thus the option bit  determines the presumed encoding (for an
association) in the absence of other, explicit encoding instructions (i.e.,
the attribute for search terms, and variants for retrieval data). 


The case for approach (c) is that either (a) or (b) alone has limitations.
An attribute won't cover retrieved data, just the search term. The option
bit alone won't let you overide it for a search term.  (You could overide it
in retrieval data, using variants. Why not be able to overide it for a
search term?) 


 If you're wondering how this proposal fits with the existing character set
negotiation definition --  I think the obvious answer (and I say this with
both regret and relief) is that this is intended to supercede that
definition if approach (b) or (c) is adopted. 


Please comment on this proposal.  Is there one of these three approaches you
like, or is there another approach altogether that you favor? 
  


--Ray 

Received on Friday, 1 March 2002 11:23:39 UTC