- From: Jim Melton <jim.melton@acm.org>
- Date: Thu, 13 Jun 2002 21:04:57 -0600
- To: Martin Duerst <duerst@w3.org>
- Cc: jim.melton@acm.org (Jim Melton), www-i18n-comments@w3.org, w3c-i18n-ig@w3.org, w3c-xml-query-wg@w3.org
Martin, Thanks for the note; my response is delayed partly because of travel and partly because I wanted to meet with the Functions & Operators people to see if the subject came up (it didn't). At 02:08 PM 2002-06-05 +0900 Wednesday, Martin Duerst wrote: >Hello Jim, dear XML Query WG, > >We discussed this comment of your at our teleconference >yesterday, and I was actioned to convey our decision to you. > >At 18:39 02/05/31 +0900, Jim Melton wrote: > >>This is a last call comment from Jim Melton (jim.melton@acm.org) on >>the Character Model for the World Wide Web 1.0 >>(http://www.w3.org/TR/2002/WD-charmod-20020430/). >> >>Semi-structured version of the comment: >> >>Submitted by: Jim Melton (jim.melton@acm.org) >>Submitted on behalf of (maybe empty): W3C XML Query Working Group >>Comment type: editorial >>Chapter/section the comment applies to: 3.2 Digital Encoding of Characters >>The comment will be visible to: public >>Comment title: proprietary charset identifiers >>Comment: >>Section 3.2, "Digital Encoding of Characters", list element 4, contains >>the phrase "... is identified by an IANA charset identifier." >> >>In fact, there are a great many CESes that are identified by charset >>identifiers that are not assigned by IANA at all, but that are "created" >>by proprietary means (e.g., corporations). The Character Model >>specification must not prohibit the use of CESes identified by charset >>identifiers assigned through other means. >> >>To correct this, simply change "...is identified by an IANA charset >>identifier." to "...is identified by a unique identifier, such as an IANA >>charset identifier." > >However, working on the details today, I discovered that >it may be better to request a clarification from you first. I'll try...assuming *I* understand enough to clarify ;^) >You request that section 3.2 mentions other identifiers for >character encodings than those registered by IANA. But >Section 3.2 just mentions the labels as part of the overall >model. Details of what encodings to use or not to use, >and what labels to use for them, are given in Section 3.6.2 >(http://www.w3.org/TR/2002/WD-charmod-20020430/#sec-EncodingIdent). > >Section 3.6.2 also has a very strong emphasis on IANA labels, >because using labels from a single registry is the only way >to avoid conflicts, and the IANA registry is the registry >used on the Internet (and the Web is part of the Internet). I understand this much, at least. >Given this, can you please clarify whether the Query WG meant that: > >a) changing "...is identified by an IANA charset identifier." to > "...is identified by a unique identifier, such as an IANA > charset identifier." is appropriate in Section 3.2 because > this is a general discussion, and any set of unique identifiers > could do, and specifics are discussed in 3.6.2. > >b) The change was intended to make sure that encoding identifiers > other than those registered with IANA would conform to the > character model; Section 3.6.2 would have to be changed, too. > >Yesterday, we forgot about 3.6.2, but assumed the intent of b). >If b) is your intent, please find our answer below. If your intent >was a), or something else, we will have to reconsider your comment. *My* intent, when I drafted the comment for discussion by the Query WG, was more nearly b) than a), but I am reluctant to commit to b) (and not for the reason that you reject the comment based on assumption b)!). [I emphasized "*My*" because I did not hear this discussed with the Query WG and do not wish to infer their intent.] Instead of writing a bunch of text here, I'll respond as we proceed below: ><assumption value='b)'> >First, please note that your classification of this comment was >'editorial', but we have decided to reclassify it as 'substantial'. > >Second, we have decided to reject this comment, based on the >following reasons: > >- IANA charset identifiers (except for those starting with x-) are > guaranteed to be unique. Adding any other set(s) of identifiers > to the IANA identifiers very quickly removes this guarantee. > Because of that, your proposed change can either be seen as an > unnecessary addition, putting in more words but, under careful > analysis, not saying anything different, or it can be misunderstood > by readers to guarantee some uniqueness when indeed such a guarantee > is not possible. > [If you know about some trick to guarantee uniqueness among different > sets of identifiers, then we sure would like to know.] Of course, without a registry, there is no ability to *guarantee* uniqueness. However, in certain application situations (because of known scope to environments, for example) it is possible to ensure (even guarantee) uniqueness without having to resort to an *external* registry such as IANA. For example, documents that are used (privately) in a single enterprise, where it is known that the enterprise has guaranteed such uniqueness, the services of an external entity such as IANA is not needed or even useful. >- IANA does not 'assign' identifiers, it just registers them. Of course you are correct; my words were careless, even though I understand the distinction and the situation. Apologies! > Anybody can apply for registration. A few years ago, there has been > a tendency to restrict registration to widely used/usable encodings, > but this lead to the defacto use of many unregistered encodings > with an x- prefix. Registration practice has changed to be very > liberal now, while making sure that each registration notes duly > whether the encoding in practice is suitable for the use on the > Internet at large. If any corporation represented in the XML > Query WG or elsewhere uses encodings that are not registered > with IANA, we strongly recommend to register them. Recommend away. It will *never* be the case that 100% of all such encodings in use by every enterprise on the planet are registered. And those enterprises *will* find uses for XML in their environments. One of my (and others') difficulties with some assumptions in and behind the character model is that no use of XML is deemed "valid" unless it adheres to a potentially very large set of restrictions. That is not, IMHO, the way to make XML maximally used or useful, although it is certainly appropriate to urge whenever true "world-wide" use of data and applications is planned. >- The IANA registry already contains registrations for many (some > even say too many) proprietary encodings. Indeed, the majority > of encodings registered are proprietary encodings rather than > encodings defined by standards organizations. There is quite > some chance that your encoding is already registered. Please check. I have. They're not. And if "some...say too many", that doesn't sound like the community at large really wants to see more and more private encodings being registered. In fact, some of my employer's encodings are registered and some are not. Will they all be, some day? Who knows? It's not really a priority (especially in this economy!). But our customers still want to use the encodings. An important point that I'm trying to make is this: The more restrictions that are placed on "applications" (broadest sense) in order to be "conforming", and the more those restrictions are viewed as "rules for the sake of having rules" instead of adding real value, the more people will choose not to *claim* conformance...after which they will quit following even the rules that make sense. Balance! That's the key! >- The IANA registry already contains many (some even say too many) > aliases for most encodings. There is quite some chance that the > identifier used inside your corporation is already an alias. Many are. Some are not. The same statements made above apply. >Please tell us, at your earliest convenience, whether you are >satisfied with our decision or not. If not, please provide >additional rationale. ></assumption> Unfortunately, I missed the Query WG teleconference this week due to my travel, so we didn't have a chance to discuss the subject (at least not with me participating). However, I assure you that I am not satisfied with the decision and I will recommend (with, I expect, considerable support) that the Query WG respond that it is not satisfied. This is a really important issue, along with the question of who/when for normalization, over which we continue to disagree. I hope devoutly that we can reach a common ground whereby the I18n's very important task of making the Web truly accessible to all is balanced with the database-oriented Query and Schema vendors' need to satisfy their real-world customer requirements. Thanks very much for continuing the dialog, Jim ======================================================================== Jim Melton --- Editor of ISO/IEC 9075-* (SQL) Phone: +1.801.942.0144 Oracle Corporation Oracle Email: mailto:jim.melton@oracle.com 1930 Viscounti Drive Standards email: mailto:jim.melton@acm.org Sandy, UT 84093-1063 Personal email: mailto:jim@melton.name USA Fax : +1.801.942.3345 ======================================================================== = Facts are facts. However, any opinions expressed are the opinions = = only of myself and may or may not reflect the opinions of anybody = = else with whom I may or may not have discussed the issues at hand. = ========================================================================
Received on Friday, 14 June 2002 03:08:14 UTC