RE: character encoding assumptions and approaches from Pieter Van Lierop on 2002-03-07 (www-zig@w3.org from March 2002)

From: Pieter Van Lierop <pvanlierop@geac.fr>
Date: Thu, 7 Mar 2002 14:51:32 +0100
To: "'Ray Denenberg'" <rden@loc.gov>, www-zig@w3.org
Message-ID: <00DE8F985709D6119F6B00805F851D8504F780@parisexchange.fr.geac.com>

My two points, very briefly:

1. We (ZIG) do not "bother" ourselves with the records retrieved because
they are external to the Z39.50 protocol. An external is something that we
don't know, unless it is defined by ourselves. 
We do not know the contents of an external, we do not know its use, we do
not know its character set. We do not care about that.
The character set of an external is outside the scope of the z39.50 protocol
(again: unless it is defined by ourselves).

2. The new option bit should apply, and necessarily will apply, not only to
search term, but to all fields defined as InternationalString, which is not
only the search term but also (very important if you have scan!) the scan
term and all the headings retrieved through the scan response. Actually, as
far as character sets issues concerned, we have more problems with the scan
than with the search.

Pieter van Lierop

> -----Message d'origine-----
> De : Ray Denenberg [mailto:rden@loc.gov]
> Envoyé : mercredi 6 mars 2002 19:29
> À : www-zig@w3.org
> Cc : zig
> Objet : Re: character encoding assumptions and approaches
> 
> 
> Pieter Van Lierop wrote:
> 
> > 1. I still think that the Z39.50 protocol should not bother with the
> > contents of anything that is not defined in the Z39.50 
> protocol. For example
> > a MARC record. From the point of view of a MARC record, 
> Z39.50 is only a
> > transport mechanism. The MARC syntaxes have their own 
> committees, standards,
> > protocols, traditions, national standards, international 
> standards: we
> > should not bother with that.
> 
> Sorry, Pieter, I don't follow. I'm not sure what "bother" 
> means in this context.
> Z39.50 should "enable", not "bother".  The protocol should facilitate
> enforcement of the rules.
> 
> 
> > 2. The character set agreement that we are discussing does 
> not only imply to
> > the search term, but to all fields defined as 
> "International String". Is
> > this correct or not?
> 
> No. The current thread of this discussion focuses on marc 
> records, and they go
> as external.  The agreement we're discussing is that if utf-8 
> is negotiated, and
> if a server has a record to transfer that is or includes  text (i.e.
> characters), and if the utf-8 negotiated has not been 
> overiden for that record,
> then the server will transfer it in utf-8.
> 
> >
> > This means that, amongst others, the following fields are 
> to be considered:
> > ImplementationId, ImplementationName, ResultSetName/ResultSetId,
> > DatabaseName, AdditionalInfo (in a diagnostic), 
> ElementSetName, DisplayTerm
> > (in Scan)
> 
> I would say that those which we have called "message strings" 
> -- additionalInfo
> in a diagnostic, elementSetName -- yes. Those that we have 
> called "name string"
> I don't think it matters.
> 
> > Actually, the Term (in Search and Scan) is generally 
> considered to be an
> > OCTET STRING. I believe that most client applications send 
> it as an OCTET
> > STRING.
> > Does that mean that when the client application sends Term 
> as an OCTET
> > STRING, the character set agreement does *not* apply?
> 
> I don't know. It's a good question and we need to give it 
> some thought.
> 
> 
> --Ray
>

Received on Thursday, 7 March 2002 08:52:52 UTC