- From: Ray Denenberg <rden@loc.gov>
- Date: Thu, 21 Feb 2002 14:07:59 -0500
- To: zig <www-zig@w3.org>
- Message-ID: <3C75458F.EDCA07EA@loc.gov>
I posted a message last October about character sets; see: http://lists.w3.org/Archives/Public/www-zig/2001Oct/0041.html There was some response, not much though, and so I have engaged those who responded or appeared interested in this issue in a private discussion outside of the list. We now have a proposal for discussion. We propose adoption of one of the following three approaches: (a) Assign an option bit for utf-8 encoding. (b) Define an attribute for the encoding of a search term. (c) Do both. Option bit If this bit is negotiated it would pertain to retrieved data as well as the search term. Additional option bits would be defined as needed, however with the premise that there would never be more than just a few (no bits for non-unicode encodings, for example). If an additional bit is defined, say utf-16, then only one could be negotiated for a given association. (The client could propose more than one and the server responds with only one.) If a particular encoding is negotiated, it is the presumed encoding for the association, unless overiden (for example, by a variant, or, in the case of approach (c), an attribute). If no encoding is negotiated then behavior is unspecified; that is, current behavior (whatever that is) is in force. Attribute Define an encoding attribute, initially with one value, "utf-8", additional values added as needed. There would be no default; recommended practice would be to always include the attribute, but if it is omitted, behavior is unspecified. Both Option bit and attribute. If the option bit is negotiated, then the attribute may overide it. (So in this case the semantics of omission would be slightly different, but recommended practice would still be to always include it.) Thus the option bit determines the presumed encoding (for an association) in the absence of other, explicit encoding instructions (i.e., the attribute for search terms, and variants for retrieval data). The case for approach (c) is that either (a) or (b) alone has limitations. An attribute won't cover retrieved data, just the search term. The option bit alone won't let you overide it for a search term. (You could overide it in retrieval data, using variants. Why not be able to overide it for a search term?) If you're wondering how this proposal fits with the existing character set negotiation definition -- I think the obvious answer (and I say this with both regret and relief) is that this is intended to supercede that definition if approach (b) or (c) is adopted. Please comment on this proposal. Is there one of these three approaches you like, or is there another approach altogether that you favor? --Ray
Received on Thursday, 21 February 2002 14:06:49 UTC