Editorial suggestions for draft-duerst-iri-07 from Chris Lilley on 2004-05-11 (public-iri@w3.org from May 2004)

From: Chris Lilley <chris@w3.org>
Date: Tue, 11 May 2004 03:08:39 +0200
To: public-iri@w3.org
Message-ID: <1808626862.20040511030839@w3.org>
Hello public-iri,

These editorial comments relate to
http://www.w3.org/International/iri-edit/draft-duerst-iri-07.txt

From the new appendix A

>  New schemes are not needed to distinguish URIs from true IRIs (i.e.
   IRIs that contain non-ASCII characters). The benefit of being able to
   detect the origin of percent-encodings is marginal, also because
   UTF-8 can be detected with very high reliably. Deploying new schemes
   is extremely hard. Not needing new schemes for IRIs makes deployment
   of IRIs vastly easier. Making conversion scheme-dependent is highly
   unadvisable. Using an uniform convention for conversion from IRIs to
   URIs makes IRI implementation orthogonal from the introduction of
   acual new schemes.

I suggest some slight wording and spelling changes (editorial)

  New schemes are not needed to distinguish URIs from true IRIs (i.e.
  IRIs that contain non-ASCII characters). The benefit of being able
  to detect the origin of percent-encodings is marginal, because UTF-8
  can be detected with very high reliability. Deploying new schemes is
  extremely hard, so not requiring new schemes for IRIs makes
  deployment of IRIs vastly easier. Making conversion scheme-dependent
  is highly inadvisable, and would be encouraged by such an approach.
  Using an uniform convention for conversion from IRIs to URIs makes
  IRI implementation orthogonal to the introduction of actual new
  schemes.

It might also be added that the TAG recommends not adding new schemes
that are almost exactly like HTTP; i:http: or httpi: would have
exactly that problem.

>  UTF-8 avoids a double layering and overloading of the use of the "+"
   character. UTF-8 is fully compatible with US-ASCII, and has
   therefore been recommended by the IETF, and is being used widely,
   while UTF-7 has never been used much and is now clearly being
   discouraged.

I suggest a small change

   Using UTF-8 avoids a double layering and overloading of the use of
   the "+" character. UTF-8 is fully compatible with US-ASCII, and has
   therefore been recommended by the IETF, and is being used widely,
   while UTF-7 has never been used much and is now clearly being
   discouraged.

You might also mention here that using UTF-8 here is existing practice
and that requiring implementations to convert to the rarely used UTF-7
is an additional implementation burden.

The arguments against using %u and against inline encoding
declarations are well made.

In 3.1  Mapping of IRIs to URIs, the renumbering of the sub steps in
step two is clearer than in the previous draft.

Should non-realworld, non-resolving sample URIs such as
http://big.site/PopularPage.html not be, for example,
http://big.example/PopularPage.html ?

-- 
 Chris Lilley                    mailto:chris@w3.org
 Chair, W3C SVG Working Group
 Member, W3C Technical Architecture Group
Received on Monday, 10 May 2004 21:35:05 UTC