IRI issues (in quite some detail) from Martin J. Dürst on 2009-10-12 (public-iri@w3.org from October 2009)

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 12 Oct 2009 14:48:57 +0900
To: "public-iri@w3.org" <public-iri@w3.org>, Pete Resnick <presnick@qualcomm.com>, Alexey Melnikov <alexey.melnikov@isode.com>, Lisa Dusseault <lisa.dusseault@messagingarchitects.com>, Ted Hardie <ted.ietf@gmail.com>, Ian Hickson <ian@hixie.ch>, Mark Davis <mark@macchiato.com>
Message-ID: <4AD2C349.6050609@it.aoyama.ac.jp>

This is a laundry list of issues that have come up on the IRI spec 
update. They are grouped into things that are related where possible. I 
hope this is a fairly complete initial pass, but I'm sure there are 
still a few things missing.

In your replies, please distinguish addition of issues from discussion 
of specific issues.



IRIs and IDNA
=============
- %encoding vs. punycode when converting from IRI to URI
   (see mail by Roy:
    http://lists.w3.org/Archives/Public/public-iri/2009Aug/0010.html
    and I-D by Dave Thaler:
    http://tools.ietf.org/html/draft-iab-idn-encoding)

- Update of Bidi section:
   - allow combining marks at end of component
   - adopt component restrictions to those in [IDNA-Bidi]
   - check about other syntactic characters (not only dot)
     and payload characters (e.g. %)
   [- rework examples]

- IDNA 2003 vs. IDNA 2008:
   - to map or not to map for IRI->URI and on resolution in general
     - what mapping to use (see http://www.unicode.org/reports/tr46/
       for a potential direction)
     - what to do about ß (sharp s) and ς (final sigma)
       - short term
       - long term
   - advice for authors:
     - Always use prepped (in IDNA 2003 termiology) or
       legal U-Label (in IDNA 2008 terminology)
     - Avoid separators other than '.'
     - Avoid IDNs that are not legal in either IDNA 2003 or 2008 ?


LEIRIs and HTML5 references
===========================

- Are there other "main areas" (like XML and HTML) that warrant similar
   'preferential treatment' [let's really hope not] (see also
   http://www.w3.org/International/iri-edit/spec-use-survey.html
   (way incomplete))

- Naming these explicitly (or not)
   - What's the best name for HTML5 references

- Using syntax or procedure for definition
   (syntax seems to work better for the requirements of XML and LEIRIs,
    procedure may work better for HTML5)

- Place in spec: Appendix? Separate section (for each, or for both
   together?)? As part of a section 5 (Normalization and Comparison;
   probably not, seems confusing to many people)

- Mix with main IRI->URI procedure or not (ideally separate, but may
   not be easy for some aspects)

- What to keep in 'host' specs (e.g. definition of whitespace?)


HTML5 reference specific issues
===============================

- '\' as path separator

- '#' in fragment identifiers

- '[' and ']' other than for IPv6 literals

- Processing of other characters not allowed

- treatment of lonely '%' (not followed by 2 hex digits)

- special behavior for encoding in http: and https: query parts
   (use document encoding if available instead of UTF-8)

- some more (to be completed, including pointer to relevant documents 
(from Anne)

- How to advise authors,... against using 'bugwards-compatible' features
   (completed for LEIRIs, needs to be discussed and done for HTML5)


IRI issues
==========
(at http://www.w3.org/International/iri-edit/,
not already mentioned above)
- http://www.w3.org/International/iri-edit/#identity-101
- http://www.w3.org/International/iri-edit/#transcodeNFC-103


Registration issues
===================

- Allow definition of URI schemes simply in terms of IRIs?

- What other adjustments needed resulting from issues above?


Issues for individual schemes
=============================

- Piggibacking mailto:
   - Allowing UTF-8 officially where current email infrastructure
     does allow it
   - Fixing other issues in mailto:

- Updating mailto: for EAI (or creating a new scheme)

- Others?


URI issues (potentially)?
==========
- do '[' and ']' need to be forbidden in URIs
- does '#' need to be forbidden in URI fragment parts


Regards,   Martin.

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

Received on Monday, 12 October 2009 05:49:48 UTC