From: Jim Whitehead <ejw@ics.uci.edu> To: ietf-dav-versioning@w3.org Date: Fri, 18 Feb 2000 17:33:19 -0800 Message-ID: <NDBBIKLAGLCOPGKGADOJEEGCCOAA.ejw@ics.uci.edu> Subject: RE: Labels Tim Ellison writes: > To my knowledge, URLs are not internationalized. > How do you write a URL with double-byte characters, etc? There are no standards for how to create internationalized URLs, but there is the following Internet-Draft: http://www.ics.uci.edu/pub/ietf/uri/draft-masinter-url-i18n-04.txt "Internationalized Uniform Resource Identifiers (IURI)" Larry Masinter, Martin Duerst I'm not sure what the current status is of this draft. > I think that the only distinction between labels and revision > ids, is that users can define, set, remove, etc. labels. Making them Strings > simply adds unnecessary overhead to the spec. For example, we will have to support > operations on mixed ascii and Unicode and specified codepage labels, > including switching on the fly when dealing with the LABEL XML body and > adapting to clients' accept-charset requests. It's possible, but messy. Since we're writing an IETF protocol specification, we have to ensure we are conformant with the document, "IETF Policy on Character Sets and Languages", RFC 2277 <http://www.ietf.org/rfc/rfc2277.txt>. Requirements from the document that pertain here: Protocols MUST be able to use the UTF-8 charset, which consists of the ISO 10646 coded character set combined with the UTF-8 character encoding scheme, as defined in [10646] Annex R (published in Amendment 2), for all text. Protocols MAY specify, in addition, how to use other charsets or other character encoding schemes for ISO 10646, such as UTF-16, but lack of an ability to use UTF-8 is a violation of this policy; such a violation would need a variance procedure ([BCP9] section 9) with clear and solid justification in the protocol specification document before being entered into or advanced upon the standards track. Since we're marshalling labels as XML, and since XML already specifies how to record the character set encoding being used, as well as the language, for the protocol i18n does not add any new marshalling concerns. - Jim