- From: Larry Masinter <masinter@adobe.com>
- Date: Wed, 4 Nov 2009 15:19:42 -0800
- To: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
I'd wanted to get this out before the ID cut-off, but I'd
gone around a few too many times, and want to start over.
I've picked out the parts of the IRI document that talk
about ireg-name and holding of host names, and made
an outline of what I think could be moved into a separate
document.
========================================================
The document parts would say:
* Applicability
* Syntax (what's allowed)
* Processing (what to do with what you have)
* Translation (convert to a reg-name for use in a URI)
* Reconversion (convert a URL reg-name into an ireg-name)?
* Comparison (how to compare the ireg-names)
These parts would be used by the "main" document which
would have the same components.
======================================================
Introduction
This document describes syntax, processing, and
comparison of the "ireg-name" component of IRIs.
It is a separate document to focus discussion
and coordination.
=============
Applicability
These methods only apply to ireg-name parts of IRIs.
Domain Names may appear in parts of an IRI other
than the ireg-name part. It is the responsibility
of scheme-specific implementations to apply the
necessary conversion if needed otherwise.
For example if the Internationalized Domain Name
is part of 'iquery' component of a HTTP URI, the
interpretation of the domain name is up to the
server, e.g., trying to validate the Web page at
http://résumé.example.org
would lead to an IRI of
http://validator.w3.org/check?uri=http%3A%2F%2Frésumé.
example.org, which would convert to a URI of
http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.
example.org. In this case, the server-side
implementation is responsible for making the
necessary conversions to be able to retrieve the Web
page.
=======
Syntax:
Currently, the IRI draft contains the following definition of
ireg-name:
ireg-name = *( iunreserved / sub-delims )
sub-delims = "!" / "$" / "&" / "'" / "(" / ")"
/ "*" / "+" / "," / ";" / "="
iunreserved = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
ucschar = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
/ %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
/ %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
/ %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
/ %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
/ %xD0000-DFFFD / %xE1000-EFFFD
This doesn't seem right to me -- why would we allow
sub-delims in domain names?
What about %xx
==========
Processing:
IRI processors may process Unicode strings directly.
As an example, the restrictions of [RFC3490] on bidirectional
domain names correspond to treating each label of a domain name as
a component for schemes with ireg-name as a domain name.
(give advice about how to invoke gethostname?)
(handling of percent-encoded things that aren' allowed)
=====
Converting an ireg-name of IRI to a host name:
Schemes that allow non-ASCII based characters in the reg-name (ireg-
name) position MUST convert the ireg-name component of an IRI as
follows:
Replace the ireg-name part of the IRI by the part converted using the
ToASCII operation specified in Section 4.1 of [RFC3490] on each dot-
separated label, and by using U+002E (FULL STOP) as a label
separator, with the flag UseSTD3ASCIIRules set to FALSE, and with the
flag AllowUnassigned set to FALSE. The ToASCII operation may fail,
but this would mean that the IRI cannot be resolved. In such cases,
if the domain name conversion fails, then the entire IRI conversion
fails. Processors that have no mechanism for signalling a failure
MAY instead substitute an otherwise invalid host name, although such
processing SHOULD be avoided.
((DESIGN QUESTION: What about e.g.
http://r%C3%A9sum%C3%A9.example.org in an IRI? Will that get
converted to punycode, or not?))
Various IRI schemes may allow the usage of Internationalized Domain
Names (IDN) [RFC3490] either in the ireg-name part or elsewhere.
Character Normalization also applies to IDNs, as discussed in
Section 5.3.3.
========
Converting host names in URIs to I18N host names:
punicode to Unicode
=======
Comparing host names:
case insensitivitiy for ascii
dealing with variant forms?
========
processing of "host" header?
Received on Wednesday, 4 November 2009 23:20:20 UTC