- From: Larry Masinter <masinter@adobe.com>
- Date: Wed, 4 Nov 2009 15:19:42 -0800
- To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
I'd wanted to get this out before the ID cut-off, but I'd gone around a few too many times, and want to start over. I've picked out the parts of the IRI document that talk about ireg-name and holding of host names, and made an outline of what I think could be moved into a separate document. ======================================================== The document parts would say: * Applicability * Syntax (what's allowed) * Processing (what to do with what you have) * Translation (convert to a reg-name for use in a URI) * Reconversion (convert a URL reg-name into an ireg-name)? * Comparison (how to compare the ireg-names) These parts would be used by the "main" document which would have the same components. ====================================================== Introduction This document describes syntax, processing, and comparison of the "ireg-name" component of IRIs. It is a separate document to focus discussion and coordination. ============= Applicability These methods only apply to ireg-name parts of IRIs. Domain Names may appear in parts of an IRI other than the ireg-name part. It is the responsibility of scheme-specific implementations to apply the necessary conversion if needed otherwise. For example if the Internationalized Domain Name is part of 'iquery' component of a HTTP URI, the interpretation of the domain name is up to the server, e.g., trying to validate the Web page at http://résumé.example.org would lead to an IRI of http://validator.w3.org/check?uri=http%3A%2F%2Frésumé. example.org, which would convert to a URI of http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9. example.org. In this case, the server-side implementation is responsible for making the necessary conversions to be able to retrieve the Web page. ======= Syntax: Currently, the IRI draft contains the following definition of ireg-name: ireg-name = *( iunreserved / sub-delims ) sub-delims = "!" / "$" / "&" / "'" / "(" / ")" / "*" / "+" / "," / ";" / "=" iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD / %xD0000-DFFFD / %xE1000-EFFFD This doesn't seem right to me -- why would we allow sub-delims in domain names? What about %xx ========== Processing: IRI processors may process Unicode strings directly. As an example, the restrictions of [RFC3490] on bidirectional domain names correspond to treating each label of a domain name as a component for schemes with ireg-name as a domain name. (give advice about how to invoke gethostname?) (handling of percent-encoded things that aren' allowed) ===== Converting an ireg-name of IRI to a host name: Schemes that allow non-ASCII based characters in the reg-name (ireg- name) position MUST convert the ireg-name component of an IRI as follows: Replace the ireg-name part of the IRI by the part converted using the ToASCII operation specified in Section 4.1 of [RFC3490] on each dot- separated label, and by using U+002E (FULL STOP) as a label separator, with the flag UseSTD3ASCIIRules set to FALSE, and with the flag AllowUnassigned set to FALSE. The ToASCII operation may fail, but this would mean that the IRI cannot be resolved. In such cases, if the domain name conversion fails, then the entire IRI conversion fails. Processors that have no mechanism for signalling a failure MAY instead substitute an otherwise invalid host name, although such processing SHOULD be avoided. ((DESIGN QUESTION: What about e.g. http://r%C3%A9sum%C3%A9.example.org in an IRI? Will that get converted to punycode, or not?)) Various IRI schemes may allow the usage of Internationalized Domain Names (IDN) [RFC3490] either in the ireg-name part or elsewhere. Character Normalization also applies to IDNs, as discussed in Section 5.3.3. ======== Converting host names in URIs to I18N host names: punicode to Unicode ======= Comparing host names: case insensitivitiy for ascii dealing with variant forms? ======== processing of "host" header?
Received on Wednesday, 4 November 2009 23:20:20 UTC