Re: New Diagnostic Code Requested from Robert Waldstein on 2001-06-07 (www-zig@w3.org from June 2001)

From: Robert Waldstein <wald@library.ho.lucent.com>
Date: Thu, 7 Jun 2001 08:44:42 -0400
To: www-zig@w3.org
Message-ID: <20010607084442.A12758@ln.ho.lucent.com>

> In our next release of SiteSearch, we will be supporting the explicit
> negotiation of characterset.  Specifically, we will allow the client to
> negotiate the use of UTF-8 in searches.  With that comes the requirement
> that we convert the UTF-8 query into the correct characterset for the
> database being searched.  We need a diagnostic when the query includes
> characters that do not translate into the target characterset.  The addinfo
> field will contain the character (in the negotiated characterset) that could
> not be translated.
> 
> We recommend a practice to other implementors of not ignoring illegal
> characters.  Profiles are currently asking us to not ignore or misinterpret
> attributes and I suspect that they will eventually ask us to treat the
> user's query terms with as much respect.

Ralph, I agree with the general view of your message (even when my
implementations don't do it -)); but have a question on
    > characters that do not translate into the target characterset.

QUery by example:
    - So does a with umlaut translate to a
    - dipthong (ae)         translate to ae (a followed by e)
    - oneHalf (1 over 2)    translate to  1/2
    - a superscript 2       translate to  a 2
    - captital A            translate to "a" (guess we handle this with an
				attribute, do we do the others?)


 Guess I am asking who controls the translation - decides what does not
translate?

 thanks,
    Bob Waldstein  wald@lucent.com

Received on Thursday, 7 June 2001 08:44:23 UTC