Rough cut on "Domain Name" part of IRI trilogy

I'd wanted to get this out before the ID cut-off, but I'd 
gone around a few too many times, and want to start over.

I've picked out the parts of the IRI document that talk
about ireg-name and holding of host names, and made
an outline of what I think could be moved into a separate
document.

========================================================
The document parts would say:

* Applicability
* Syntax (what's allowed) 
* Processing (what to do with what you have)
* Translation (convert to a reg-name for use in a URI)
* Reconversion (convert a URL reg-name into an ireg-name)?
* Comparison (how to compare the ireg-names)

These parts would be used by the "main" document which
would have the same components.
  
======================================================
Introduction

  This document describes syntax, processing, and
  comparison of the "ireg-name" component of IRIs.
  It is a separate document to focus discussion
  and coordination.

=============
Applicability

  These methods only apply to ireg-name parts of IRIs.
  Domain Names may appear in parts of an IRI other 
  than the ireg-name part.  It is the responsibility
  of scheme-specific implementations to apply the
  necessary conversion if needed otherwise.
  For example if the Internationalized Domain Name 
  is part of 'iquery' component of a HTTP URI, the
  interpretation of the domain name is up to the
  server, e.g., trying to validate the Web page at
  http://résumé.example.org 
  would lead to an IRI of 
   http://validator.w3.org/check?uri=http%3A%2F%2Frésumé.
   example.org, which would convert to a URI of
   http://validator.w3.org/check?uri=http%3A%2F%2Fr%C3%A9sum%C3%A9.
   example.org.  In this case, the server-side
   implementation is responsible for making the
   necessary conversions to be able to retrieve the Web
   page.

=======
Syntax:

  Currently, the IRI draft contains the following definition of
  ireg-name:

   ireg-name      = *( iunreserved / sub-delims )
   sub-delims     = "!" / "$" / "&" / "'" / "(" / ")"
                  / "*" / "+" / "," / ";" / "="
   iunreserved    = ALPHA / DIGIT / "-" / "." / "_" / "~" / ucschar
   ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

  This doesn't seem right to me -- why would we allow
  sub-delims in domain names?
  What about %xx

========== 
Processing:
   IRI processors may process Unicode strings directly.

   As an example, the restrictions of [RFC3490] on bidirectional
   domain names correspond to treating each label of a domain name as
   a component for schemes with ireg-name as a domain name.

   (give advice about how to invoke gethostname?)
   
   (handling of percent-encoded things that aren' allowed)

=====
Converting an ireg-name of IRI to a host name:

   Schemes that allow non-ASCII based characters in the reg-name (ireg-
   name) position MUST convert the ireg-name component of an IRI as
   follows:

   Replace the ireg-name part of the IRI by the part converted using the
   ToASCII operation specified in Section 4.1 of [RFC3490] on each dot-
   separated label, and by using U+002E (FULL STOP) as a label
   separator, with the flag UseSTD3ASCIIRules set to FALSE, and with the
   flag AllowUnassigned set to FALSE.  The ToASCII operation may fail,
   but this would mean that the IRI cannot be resolved.  In such cases,
   if the domain name conversion fails, then the entire IRI conversion
   fails.  Processors that have no mechanism for signalling a failure
   MAY instead substitute an otherwise invalid host name, although such
   processing SHOULD be avoided. 


      ((DESIGN QUESTION: What about e.g.
      http://r%C3%A9sum%C3%A9.example.org in an IRI?  Will that get
      converted to punycode, or not?))

   Various IRI schemes may allow the usage of Internationalized Domain
   Names (IDN) [RFC3490] either in the ireg-name part or elsewhere.
   Character Normalization also applies to IDNs, as discussed in
   Section 5.3.3.

========
Converting host names in URIs to I18N host names:
    punicode to Unicode


=======
Comparing host names:
    case insensitivitiy for ascii
    dealing with variant forms?
    

========
processing of "host" header? 

Received on Wednesday, 4 November 2009 23:20:20 UTC