octet strings and iso 2022 from Pieter Van Lierop on 2002-03-15 (www-zig@w3.org from March 2002)

From: Pieter Van Lierop <pvanlierop@geac.fr>
Date: Fri, 15 Mar 2002 11:08:24 +0100
To: "'LeVan,Ralph'" <levan@oclc.org>, "'Ray Denenberg'" <rden@loc.gov>, www-zig@w3.org
Message-ID: <00DE8F985709D6119F6B00805F851D8504F7B4@parisexchange.fr.geac.com>

I have another reasoning.
Formally, we can not oblige a z39.50 server to interpret an Octet String as
a string in a character set negotiated in the Init.
However, the z39.50 server is free in its interpretation of the bytes in an
Octet String.
Therefore, we could say the following:
When the character set has been negotiated in the Init and the client sends
a Search Term of the type Octet String, it is recommended that the server
interpretes this Octet String as if it was a string in the negotiated
character set.

Secondly, as far as ISO 2022 concerns, I would like to continue the
discussion.
I would like to say thanks to Mark Reichert for sharing his wisdom with us.
It brought me to the conclusion that this ISO 2022 is difficult to
implement. I am not a character set specialist, but I know a few MARC types
and this ISO 2022 is always avoided. 
For example, UNIMARC says that ISO 646 is that the basic character set and
if you want to use anything else you can indicate that through ISO 2022.
However, it also uses tag 100 to explicit the character set.
And in France every UNIMARC record is in ISO 5426 and this is indicated in
tag 100 of the MARC record.

Is there anybody who implemented ISO 2022 within the Z39.50 context and who
like to share his or her experience with us?

Pieter van Lierop

> -----Message d'origine-----
> De : LeVan,Ralph [mailto:levan@oclc.org]
> Envoyé : jeudi 14 mars 2002 19:05
> À : 'Ray Denenberg'; www-zig@w3.org
> Objet : RE: Octet Strings and utf-8
> 
> 
> Let's ask the easier question.  Is anyone sending binary data 
> as a general
> Term?  If so, would you share the particulars with us?
> 
> Thanks!
> 
> Ralph
> 
> > -----Original Message-----
> > From: Ray Denenberg [mailto:rden@loc.gov]
> > Sent: Thursday, March 14, 2002 11:37 AM
> > To: www-zig@w3.org
> > Subject: Re: Octet Strings and utf-8
> > 
> > 
> > "LeVan,Ralph" wrote:
> > 
> > > Somehow I must be deciding if the term is binary, because I 
> > am sending those
> > > terms to a search engine.  The search engine is not 
> > expecting binary data.
> > 
> > If you have a search engine where binary data isn't 
> > applicable, and you've
> > negotiated utf-8 (via character set negotiation), and you're 
> > using version 2, so
> > the client has no choice but to send a term via octet string, 
> > then you might
> > argue that arbitrarily extending the negotiation to apply to 
> > octet-string-tagged
> > search terms is a reasonable and pragmatic thing to do.
> > 
> > Still there is some winking going on, since the client could 
> > only know via
> > out-of-band agreement that your search engine doesn't expect 
> > binary.  It could
> > be that the search was on title, author, etc. so a binary 
> > term wouldn't make
> > sense.
> > 
> > Would someone care to suggest some reliable rule of thumb we 
> > can adopt --
> > perhaps  based on access point, for example, that if we're 
> > searching on title,
> > author, subject .... -- that an octet-string-tagged term is 
> > guaranteed to be
> > text and not binary?
> > 
> > --Ray
> > 
> > 
>

Received on Friday, 15 March 2002 05:10:26 UTC