W3C home > Mailing lists > Public > public-iri@w3.org > June 2007

RE: Fwd: Re: HRRIs, IRIs, etc

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Wed, 20 Jun 2007 10:23:44 +0900
Message-Id: <>
To: "Grosso, Paul" <pgrosso@ptc.com>
Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, public-i18n-core@w3.org

Hello Paul, others,

First, I'd prefer a bit more notice if you want to set such a
hard deadline, and I guess others would do so, too.

Second, with respect to minor differences between the IRI spec
and the HRRI draft, I think that you should be able to look
at the differences carefully as well as I am able to look at
your draft. Nevertheless, I have tried to give it another look.

Here is what I have found (with no guarantee for completeness,
of course, please check again on your side).

http://www.w3.org/TR/REC-xml/#charsets allows (although, at least
in never versions, discourages):
[#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
[#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
[#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],

In the IRI spec, these are excluded:
   ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
                  / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
                  / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
                  / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
                  / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
                  / %xD0000-DFFFD / %xE1000-EFFFD

so you have to add them to your list in section 3. This in essence
seems to amount to adding 'iprivate', the list of characters above,
and the list of characters you already have to 'ucschar'. Please
check this understanding as a cross-check.

Apart from these small fixes, I have to very clearly note that most
of my more general concerns haven't been addressed. These are, somewhat

- The overall usefulness (seen from a overall W3C or overall IETF
  standpoint) of having separate definitions, in separate documents,
  for two essentially extremely closely related protocol elements.
  [I have proposed to integrate your material into an update of the
  IRI spec.]

- The choice of name, which is highly suggestive instead of descriptive,
  inappropriate on several accounts (for the largest part of URIs/IRIs,
  HRRIs are only marginally more readable, if at all, and the overall
  syntax still poses a lot of problems for average human users (http://...).

- The overall description. I note e.g. the following:
  "However, it is often inconvenient for authors to encode these characters."
  How often? Unless somebody is authoring a lot of XPointers by hand,
  this can't happen that often (maybe with the exception of the space,
  but then you discurage that (correctly!) yourself).
  I suggest to reword "often" to "occasionally". There are similar
  examples elsewhere.

- The classification as a BCP. Procedurally, it's unclear to me why the
  IETF would classify a protocol element spec as a BCP when the related
  ones (URI, IRI) are standards track. Content-wise, it's unclear why
  the IETF would call something a BEST current practice if in earlier
  discussion, they have clearly preferred to disallow or marginalize
  this practice (and that was only for spaces and such, not for controls).

- The security section now mentions the issues with control characters.
  This should definitely be a bit more specific, and should contain
  explicit recommendations. I'd write that receivers may want to
  filter out such characters, or URIs with such characters, and
  therefore including them in the first place is discouraged.

- You have some advice against using raw spaces ("Also, authors of HRRIs
  are advised to percent encode space characters themselves, rather than
  rely on the processor to do so, because spaces are often used to
  separate HRRIs in a sequence"), but not against others, where similar
  arguments apply:
  - tabs and CR/LF are removed/merged/coverted to spaces in attribute values
    (merging also occurrs for spaces)
  - <> are often used to delimit URIs/IRIs
  - arbitrary controls may trigger some security filter
  - private use characters are not interoperable
  - non-characters (the list above) are discouraged in XML itself
  (not sure this list is complete, but I guess it's getting close)

- The last paragraph of Section 3 is somewhat problematic. In general,
  it's okay, but the second half of the last sentence
  ("nor the process of passing a Human Readable Resource Identifier to a
   process or software component responsible for dereferencing it SHOULD
   trigger percent encoding") may suggest that resolution interfaces come
   with three different entry points. I think it would be better to have
   done this work by the XML side when resolving something. 

Not only have these concerns not yet been adressed, but also do I not
remember having received any kind of reply on these issues.

Looking forward to hear from you again.

Regards,     Martin.

At 07:37 07/06/19, Grosso, Paul wrote:
>Martin et al.,
>Please check out
>and let us know whether you have any further comments or are
>satisfied with this (draft) ID.  In either case, please send
>a response.  Several specs are on hold awaiting progression 
>of this to RFC, and we would like to be sure to make progress.
>We would prefer to have a definite response from you, but if
>we have not heard by 11:00 ET (Boston time) this Wednesday,
>we will assume you have no more comments on this spec.
>> -----Original Message-----
>> From: public-xml-core-wg-request@w3.org 
>> [mailto:public-xml-core-wg-request@w3.org] On Behalf Of Norman Walsh
>> Sent: Tuesday, 2007 June 12 12:25
>> To: Martin Duerst
>> Cc: public-iri@w3.org; Richard Ishida; Felix Sasaki; 
>> www-xml-linking-comments@w3.org; public-xml-core-wg@w3.org
>> Subject: Re: Fwd: Re: HRRIs, IRIs, etc
>> / Martin Duerst <duerst@it.aoyama.ac.jp> was heard to say:
>> | Dear IRI and XML experts,
>> [...]
>> | - The IRI spec excludes private use characters from all but 
>> the query part.
>> We have attempted to address this concern[1] by adding
>>   * characters in the Unicode private use area (#xE000-#xF8FF), except
>>     where they appear in the query part of the resulting IRI.
>> to the list.
>> |   (there are other smaller differences, but for the moment, 
>> this is enough)
>> Could you, please, provide a more exhaustive account of the
>> differences which concern you? The Core WG thinks it would be most
>> efficient if we could consider as many of them as possible at the same
>> time.
> [1] http://www.w3.org/XML/2007/04/hrri/draft-walsh-tobin-hrri-01c.html

#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp     
Received on Wednesday, 20 June 2007 01:24:49 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:39:39 UTC