Re: Z39.50 character encoding

----- Original Message -----
From: "Mike Taylor" <mike@tecc.co.uk>
Sent: Wednesday, February 27, 2002 10:01 AM


> > The problem is that MARC21 currently permits records to be encoded
> > using either of 2 mutually incompatible character sets.  You can
> > tell from inspection of the record leader what character set is
> > being used.
>
> OK, that's fine.  So if a Z39.50 server returns a MARC21 record, then
> in providing the MARC21 OID for its EXTERNAL, it's telling the client
> (among other things) that the record is in one or other of those two
> mutually incompatible character sets, and that it has to inspect the
> record leader to find out which one.

Well, the server isn't actually telling the client anything about the
character set.  It's just telling the client "here is something that claims
to be a MARC21 record for you".  And yes, the software that manipulates the
record (which may be at one or more removes from the Z39.50 client) has to
inspect the record leader to find out which character set is in use.  It can
glean no additional useful information about the record internals from the
Z39.50 message or session.

>
> That's a part of what the MARC21 OID _means_.
>
> > But Z39.50 provides no mechanism to ask for MARC records to be
> > delivered with a specific character encoding (and neither does
> > anything else, for that matter).
>
> Well, we could use eSpec to allow a client to say to its server,
> "Please give me MARC21 records in the first of the two possible
> character sets".  The the server would serve up a record which,
> hopefully, did just that.  But the client would still need to check
> the record header in order to know (just as it needs to check the
> EXTERNAL's OID to know whether its got a MARC21 record at all, as
> opposed to a STURS record or something.)

Agreed.

>
> > I agree with Ray that the OID for the record syntax does not imply a
> > character set.
>
> This is only half true.  Character set is _one_ of the many things
> that a particular record syntax may specify, along with transfer
> syntax, etc.  Or in MARC21's case (hopefully pathological) the MARC21
> OID specifies a restriction of what character sets may be used, and
> says how to tell which of the options is active.

I was probably being a bit too formally "record-syntaxy" for most people's
taste and I can live happily with the following.

>
> Bottom line: the EXTERNAL's OID says what kind of object the record
> is.  Some kinds of object (e.g. USMARC) specify a character set, and
> others (GRS-1) do not.  Those which do, we must respect.  Those which
> don't, we are at liberty to mess with: we can think about how we want
> to treat those.

Another e.g.: XML objects belong in Mike's first class of some kinds of
object - they are encoded in UTF-8, irrespective of any character set
negotiation or option bits in the Z39.50 session.

j.

Received on Wednesday, 27 February 2002 11:49:56 UTC