- From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
- Date: Sat, 20 Oct 2012 16:39:48 +0900
- To: Peter Saint-Andre <stpeter@stpeter.im>
- CC: public-iri@w3.org
Hello Peter, others, On 2012/06/08 3:42, Peter Saint-Andre wrote: > <hat type='individual'/> > > At IETF 84, we discussed the desirability of aligning the terminology in > 3987bis with RFC 6365 ("Terminology Used in Internationalization in the > IETF"). This is ticket #85 in the tracker: > > http://trac.tools.ietf.org/wg/iri/trac/ticket/85 > > I've completed a review of both documents and have a few suggestions... > > 1. In Section 1.3, cite RFC 6365 and specify that terms are to be > understood as defined in that document unless otherwise specified (in > fact, now that we have RFC 6365 it's not clear why we're citing RFC > 2130, RFC 2277, or ISO 10646). I suggest: > > OLD > The following definitions are used in this document; they follow the > terms in [RFC2130], [RFC2277], and [ISO10646]. > > NEW > Various terms used in this document are defined in [RFC6365] and > [RFC3986]. In addition, we define the following terms for use in > this document. Implemented in my editorial copy. Many thanks for the actual text proposal. > 2. Don't define anew in rfc3987bis terms that are defined in RFC 6365. > That would mean removing the following definitions from Section 1.3: > > - character > - character repertoire Done. > - character encoding (use "character encoding scheme" or "character > encoding form" instead) > - charset These two are not that simple. For background, please check http://www.w3.org/TR/charmod/#sec-Digital. Here is what we currently have for "character encoding": A method of representing a sequence of characters as a sequence of octets (maybe with variants). Also, a method of (unambiguously) converting a sequence of octets into a sequence of characters. The problem with 'charset' as defined in RFC 6365 (and elsewhere) is that it's purely one-way, from octets to characters. But there's the other direction, too. The problem with "character encoding scheme" or "character encoding form" is that they are much more specialized terms. RFC 6365 has this to say after the definition of "charset": Many protocol definitions use the term "character set" in their descriptions. The terms "charset", or "character encoding scheme" and "coded character set", are strongly preferred over the term "character set" because "character set" has other definitions in other contexts, particularly outside the IETF. When reading IETF standards that use "character set" without defining the term, they usually mean "a specific combination of one CCS with a CES", particularly when they are talking about the "US-ASCII character set". Of course, per and http://www.w3.org/MarkUp/html-spec/charset-harmful and as above, we sure don't want to use "character set". And we indeed want something to denote "a specific combination of one CCS with a CES" (or in some cases actually a combination of more than one CCS...), so neither "coded character set" (CCS) nor "character encoding scheme" (CES) will do, despite the suggestions above. So we just ended up with "character encoding", using a simple term for a very central concept, also in line with http://www.w3.org/TR/charmod/. As a result of this, we only use "charset" when it's used as a label, with a narrowed definition: "The name of a parameter or attribute used to identify a character encoding." I guess we could just drop the narrowing definition of "charset", but we can't drop "character encoding". > 3. Do we really need to define "octet", "sequence of characters", and > "sequence of octets"? Good questions. RFC 6365 uses "octet" without defining it, so I guess we can drop it. I think we can also drop "sequence of characters" and "sequence of octets", but I'd like to get Larry's okay for these. > 4. Strangely, RFC 6365 does not define "UCS", so I suppose it's OK to > define that here. Following discussions later in this thread, I'm trying to get rid of this. But it needs some more thought. Regards, Martin.
Received on Saturday, 20 October 2012 07:40:28 UTC