W3C home > Mailing lists > Public > www-zig@w3.org > March 2002

Re: Octet Strings and utf-8

From: Ray Denenberg <rden@loc.gov>
Date: Thu, 14 Mar 2002 11:37:11 -0500
Message-ID: <3C90D1B7.6D4C52DF@loc.gov>
To: www-zig@w3.org
"LeVan,Ralph" wrote:

> Somehow I must be deciding if the term is binary, because I am sending those
> terms to a search engine.  The search engine is not expecting binary data.

If you have a search engine where binary data isn't applicable, and you've
negotiated utf-8 (via character set negotiation), and you're using version 2, so
the client has no choice but to send a term via octet string, then you might
argue that arbitrarily extending the negotiation to apply to octet-string-tagged
search terms is a reasonable and pragmatic thing to do.

Still there is some winking going on, since the client could only know via
out-of-band agreement that your search engine doesn't expect binary.  It could
be that the search was on title, author, etc. so a binary term wouldn't make
sense.

Would someone care to suggest some reliable rule of thumb we can adopt --
perhaps  based on access point, for example, that if we're searching on title,
author, subject .... -- that an octet-string-tagged term is guaranteed to be
text and not binary?

--Ray
Received on Thursday, 14 March 2002 11:36:47 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 29 October 2009 06:12:22 GMT