RE: Fwd: Re: HRRIs, IRIs, etc from Martin Duerst on 2007-06-22 (public-iri@w3.org from June 2007)

From: Martin Duerst <duerst@it.aoyama.ac.jp>
Date: Fri, 22 Jun 2007 19:27:41 +0900
To: "Grosso, Paul" <pgrosso@ptc.com>
Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, <public-i18n-core@w3.org>
Message-Id: <6.0.0.20.2.20070622191441.09a3e0e0@localhost>
Hello Paul, others,

At 05:11 07/06/21, Grosso, Paul wrote:
>Martin,
>
>The XML Core WG discussed this message of yours during
>our telcon today.  I'd like to thank you for your input
>and give some preliminary responses.

Great, thanks.

>[We have only just now noticed your email at
>http://lists.w3.org/Archives/Public/public-iri/2007May/0000
>that most of us on the XML Core WG never saw before,
>so we have not yet discussed those points.]
>
>[I'm not sure I have permission to cross post to all the
>various lists, but I hesitate to remove anyone, so we'll
>have to see how this works.]

For the moment, it seems to work.

>> -----Original Message-----
>> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] 
>> Sent: Tuesday, 2007 June 19 20:24
>> To: Grosso, Paul
>> Cc: public-iri@w3.org; Richard Ishida; Felix Sasaki; 
>> www-xml-linking-comments@w3.org; public-xml-core-wg@w3.org; 
>> public-i18n-core@w3.org
>> Subject: RE: Fwd: Re: HRRIs, IRIs, etc
>
>> In the IRI spec, these are excluded:
>>    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
>>                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>>                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>>                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>>                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>>                   / %xD0000-DFFFD / %xE1000-EFFFD
>> 
>> so you have to add them to your list in section 3.
>
>We'll plan to add them.
>
>> - The overall usefulness (seen from a overall W3C or overall IETF
>>   standpoint) of having separate definitions, in separate documents,
>>   for two essentially extremely closely related protocol elements.
>>   [I have proposed to integrate your material into an update of the
>>   IRI spec.]
>
>The XML Base PER went out in December, and the XLink 1.1 CR
>ended a year ago (July 2007), and these are both awaiting
>resolution of this issue.

Should that be July 2006? Anyway, that's a long time ago.
It's a pity that we haven't learned about this earlier.

>Both the basic idea as well as most of the actual wording 
>for what we are now calling HRRIs currently exist in several 
>Recs including XML, XLink, XML Base, and maybe others.  Our
>attempt here was just to pull that wording out if the
>various specs and reference a definition in one place. 
>We were hoping to to this in an expeditious manner.

I understand.


>We discussed the options with our team contact who discussed
>it with W3T, and we agreed that a short RFC was the best approach.

This seems to be like a typical example of locally optimal
advice. Good on one level, problematic on a higher level.
[I'm sure I have given such advice in the past when I was
on the W3C Team.]


>> - The choice of name, which is highly suggestive instead of 
>> descriptive,
>>   inappropriate on several accounts (for the largest part of 
>> URIs/IRIs,
>>   HRRIs are only marginally more readable, if at all, and the overall
>>   syntax still poses a lot of problems for average human 
>> users (http://...).
>
>We had a hard time coming up with a name ourselves, and
>we'd consider another name if we can find one more generally 
>acceptable.  We do think that allowing spaces (as is the case 
>with HRRIs) does improve readability a bit,

I'd probably have to agree. I think of the characters in question,
spaces are also those that one sees most in the wild.

>but we'd be happy 
>with any name that works.  We had called these XML Resource 
>Identifiers earlier, but (1) the XRI acronym is already taken 
>and (2) these have meaning and usefulness outside of XML.
>
>If anyone has suggestions, we're interested in considering them.

I made some in a previous mail.

>> - The overall description. I note e.g. the following:
>>   "However, it is often inconvenient for authors to encode 
>> these characters."
>>   How often? Unless somebody is authoring a lot of XPointers by hand,
>>   this can't happen that often (maybe with the exception of the space,
>>   but then you discurage that (correctly!) yourself).
>>   I suggest to reword "often" to "occasionally". There are similar
>>   examples elsewhere.
>
>As you say, the space character is the most common.  We would be 
>happy to tone down or remove this sentence; our motivation for 
>defining HRRIs is not that they are a good thing but that they 
>already exist in multiple standards.

Okay, fine.

>> - The classification as a BCP. Procedurally, it's unclear to 
>> me why the
>>   IETF would classify a protocol element spec as a BCP when 
>> the related
>>   ones (URI, IRI) are standards track. Content-wise, it's unclear why
>>   the IETF would call something a BEST current practice if in earlier
>>   discussion, they have clearly preferred to disallow or marginalize
>>   this practice (and that was only for spaces and such, not 
>> for controls).
>
>I think this may be a "typo".  I believe we intended
>this to become an RFC.
>
>Actually, we don't care what it becomes as long as it is
>referenceable from the various W3C XML-related specs.

Understood.

>> - The security section now mentions the issues with control 
>> characters.
>>   This should definitely be a bit more specific, and should contain
>>   explicit recommendations. I'd write that receivers may want to
>>   filter out such characters, or URIs with such characters, and
>>   therefore including them in the first place is discouraged.
>
>Most of us in the XML Core WG don't feel as strongly as you
>appear to that we need to go on at great length about security
>issues, but we are happy to expand this section along the lines
>you suggest.

Well, security sections are always looked at carefully when an RFC
is published. Preparation saves time later.
 
>> - You have some advice against using raw spaces ("Also, 
>> authors of HRRIs
>>   are advised to percent encode space characters themselves, 
>> rather than
>>   rely on the processor to do so, because spaces are often used to
>>   separate HRRIs in a sequence"), but not against others, 
>> where similar
>>   arguments apply:
>>   - tabs and CR/LF are removed/merged/coverted to spaces in 
>> attribute values
>>     (merging also occurrs for spaces)
>>   - <> are often used to delimit URIs/IRIs
>>   - arbitrary controls may trigger some security filter
>>   - private use characters are not interoperable
>>   - non-characters (the list above) are discouraged in XML itself
>>   (not sure this list is complete, but I guess it's getting close)
>
>These are all good points.  We will expand the document
>along the lines you suggest.
>
>> 
>> - The last paragraph of Section 3 is somewhat problematic. In general,
>>   it's okay, but the second half of the last sentence
>>   ("nor the process of passing a Human Readable Resource 
>> Identifier to a
>>    process or software component responsible for 
>> dereferencing it SHOULD
>>    trigger percent encoding") may suggest that resolution 
>> interfaces come
>>    with three different entry points. I think it would be 
>> better to have
>>    done this work by the XML side when resolving something. 
>
>The above quoted phrase is in the XLink 1.1 CR, but we are not
>sure at this time exactly why it is in there.

A similar phrase at least was at one point in the XML spec, and
something similar is in the IRI spec. For resolution, it's not
terribly important, because for resolution, the average %-escaped
stuff and its unescaped counterpart are equivalent. But for
things such as namespace processing, it's important not to
change %-encodings because e.g. http://www.w3.org/A and
http://www.w3.org/%41 are two different namespaces.


Regards,    Martin.


>We are discussing this and will try to figure out what to do
>about this wording and let you know.
>
>paul


#-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
#-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
Received on Friday, 22 June 2007 10:40:24 UTC