RE: Octet Strings and utf-8 from LeVan,Ralph on 2002-03-13 (www-zig@w3.org from March 2002)

From: LeVan,Ralph <levan@oclc.org>
Date: Wed, 13 Mar 2002 16:43:37 -0500
To: "'Ray Denenberg'" <rden@loc.gov>, www-zig@w3.org
Cc: zig <www-zig@w3.org>
Message-ID: <E5431CF93E29F9478878F623E5B9CE9802DD921D@OA3-SERVER.oa.oclc.org>

Somehow I must be deciding if the term is binary, because I am sending those
terms to a search engine.  The search engine is not expecting binary data.
I am providing the conversion.  Right now, the default is to assume that we
are getting Latin-1.  (In fact, I think that's out in the ZIG's profile, but
we called it the default z-context or something like that.)  I then convert
the Latin-1 bytes into Java Strings to give to my search engine.  But, when
the client says that they are sending me UTF-8 "stuff" I make the
negotiation apply to general Terms too.

Ralph

> -----Original Message-----
> From: Ray Denenberg [mailto:rden@loc.gov]
> Sent: Wednesday, March 13, 2002 4:22 PM
> To: www-zig@w3.org
> Cc: zig
> Subject: Octet Strings and utf-8
> 
> 
> "LeVan,Ralph" wrote:
> 
> > Since all the clients I talk to send me their terms in the 
> general option,
> > and since I must somehow interpret the bytes in that field, 
> I have to do a
> > conversion anyway.  So, why shouldn't the UTF-8 negotiation apply?
> 
> Because the character set negotiation definition explicity applies to
> InternationalString only.  If you arbitrarily decide that a particular
> octetString type should be affected, well then what about for 
> example the
> referenceId?  Is that subject to utf-8 negotiation?  "What a 
> silly question",
> you say, "the reference id is binary". Well don't forget, the 
> term can be
> binary too.  How do you decide whether a given octet string 
> is binary or
> character?
> 
> --Ray
>

Received on Wednesday, 13 March 2002 16:44:51 UTC