- From: LeVan,Ralph <levan@oclc.org>
- Date: Wed, 13 Mar 2002 16:43:37 -0500
- To: "'Ray Denenberg'" <rden@loc.gov>, www-zig@w3.org
- Cc: zig <www-zig@w3.org>
Somehow I must be deciding if the term is binary, because I am sending those terms to a search engine. The search engine is not expecting binary data. I am providing the conversion. Right now, the default is to assume that we are getting Latin-1. (In fact, I think that's out in the ZIG's profile, but we called it the default z-context or something like that.) I then convert the Latin-1 bytes into Java Strings to give to my search engine. But, when the client says that they are sending me UTF-8 "stuff" I make the negotiation apply to general Terms too. Ralph > -----Original Message----- > From: Ray Denenberg [mailto:rden@loc.gov] > Sent: Wednesday, March 13, 2002 4:22 PM > To: www-zig@w3.org > Cc: zig > Subject: Octet Strings and utf-8 > > > "LeVan,Ralph" wrote: > > > Since all the clients I talk to send me their terms in the > general option, > > and since I must somehow interpret the bytes in that field, > I have to do a > > conversion anyway. So, why shouldn't the UTF-8 negotiation apply? > > Because the character set negotiation definition explicity applies to > InternationalString only. If you arbitrarily decide that a particular > octetString type should be affected, well then what about for > example the > referenceId? Is that subject to utf-8 negotiation? "What a > silly question", > you say, "the reference id is binary". Well don't forget, the > term can be > binary too. How do you decide whether a given octet string > is binary or > character? > > --Ray >
Received on Wednesday, 13 March 2002 16:44:51 UTC