W3C home > Mailing lists > Public > www-xml-linking-comments@w3.org > April to June 2007

RE: Fwd: Re: HRRIs, IRIs, etc

From: Grosso, Paul <pgrosso@ptc.com>
Date: Wed, 20 Jun 2007 16:11:14 -0400
Message-ID: <CF83BAA719FD2C439D25CBB1C9D1D30207D11643@HQ-MAIL4.ptcnet.ptc.com>
To: "Martin Duerst" <duerst@it.aoyama.ac.jp>
Cc: <public-iri@w3.org>, "Richard Ishida" <ishida@w3.org>, "Felix Sasaki" <fsasaki@w3.org>, <www-xml-linking-comments@w3.org>, <public-xml-core-wg@w3.org>, <public-i18n-core@w3.org>


The XML Core WG discussed this message of yours during
our telcon today.  I'd like to thank you for your input
and give some preliminary responses.

[We have only just now noticed your email at
that most of us on the XML Core WG never saw before,
so we have not yet discussed those points.]

[I'm not sure I have permission to cross post to all the
various lists, but I hesitate to remove anyone, so we'll
have to see how this works.]

> -----Original Message-----
> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp] 
> Sent: Tuesday, 2007 June 19 20:24
> To: Grosso, Paul
> Cc: public-iri@w3.org; Richard Ishida; Felix Sasaki; 
> www-xml-linking-comments@w3.org; public-xml-core-wg@w3.org; 
> public-i18n-core@w3.org
> Subject: RE: Fwd: Re: HRRIs, IRIs, etc

> In the IRI spec, these are excluded:
>    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
>                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>                   / %xD0000-DFFFD / %xE1000-EFFFD
> so you have to add them to your list in section 3.

We'll plan to add them.

> - The overall usefulness (seen from a overall W3C or overall IETF
>   standpoint) of having separate definitions, in separate documents,
>   for two essentially extremely closely related protocol elements.
>   [I have proposed to integrate your material into an update of the
>   IRI spec.]

The XML Base PER went out in December, and the XLink 1.1 CR
ended a year ago (July 2007), and these are both awaiting
resolution of this issue.

Both the basic idea as well as most of the actual wording 
for what we are now calling HRRIs currently exist in several 
Recs including XML, XLink, XML Base, and maybe others.  Our
attempt here was just to pull that wording out if the
various specs and reference a definition in one place. 
We were hoping to to this in an expeditious manner.

We discussed the options with our team contact who discussed
it with W3T, and we agreed that a short RFC was the best approach.

> - The choice of name, which is highly suggestive instead of 
> descriptive,
>   inappropriate on several accounts (for the largest part of 
> URIs/IRIs,
>   HRRIs are only marginally more readable, if at all, and the overall
>   syntax still poses a lot of problems for average human 
> users (http://...).

We had a hard time coming up with a name ourselves, and
we'd consider another name if we can find one more generally 
acceptable.  We do think that allowing spaces (as is the case 
with HRRIs) does improve readability a bit, but we'd be happy 
with any name that works.  We had called these XML Resource 
Identifiers earlier, but (1) the XRI acronym is already taken 
and (2) these have meaning and usefulness outside of XML.

If anyone has suggestions, we're interested in considering them.

> - The overall description. I note e.g. the following:
>   "However, it is often inconvenient for authors to encode 
> these characters."
>   How often? Unless somebody is authoring a lot of XPointers by hand,
>   this can't happen that often (maybe with the exception of the space,
>   but then you discurage that (correctly!) yourself).
>   I suggest to reword "often" to "occasionally". There are similar
>   examples elsewhere.

As you say, the space character is the most common.  We would be 
happy to tone down or remove this sentence; our motivation for 
defining HRRIs is not that they are a good thing but that they 
already exist in multiple standards.

> - The classification as a BCP. Procedurally, it's unclear to 
> me why the
>   IETF would classify a protocol element spec as a BCP when 
> the related
>   ones (URI, IRI) are standards track. Content-wise, it's unclear why
>   the IETF would call something a BEST current practice if in earlier
>   discussion, they have clearly preferred to disallow or marginalize
>   this practice (and that was only for spaces and such, not 
> for controls).

I think this may be a "typo".  I believe we intended
this to become an RFC.

Actually, we don't care what it becomes as long as it is
referenceable from the various W3C XML-related specs.

> - The security section now mentions the issues with control 
> characters.
>   This should definitely be a bit more specific, and should contain
>   explicit recommendations. I'd write that receivers may want to
>   filter out such characters, or URIs with such characters, and
>   therefore including them in the first place is discouraged.

Most of us in the XML Core WG don't feel as strongly as you
appear to that we need to go on at great length about security
issues, but we are happy to expand this section along the lines
you suggest.

> - You have some advice against using raw spaces ("Also, 
> authors of HRRIs
>   are advised to percent encode space characters themselves, 
> rather than
>   rely on the processor to do so, because spaces are often used to
>   separate HRRIs in a sequence"), but not against others, 
> where similar
>   arguments apply:
>   - tabs and CR/LF are removed/merged/coverted to spaces in 
> attribute values
>     (merging also occurrs for spaces)
>   - <> are often used to delimit URIs/IRIs
>   - arbitrary controls may trigger some security filter
>   - private use characters are not interoperable
>   - non-characters (the list above) are discouraged in XML itself
>   (not sure this list is complete, but I guess it's getting close)

These are all good points.  We will expand the document
along the lines you suggest.

> - The last paragraph of Section 3 is somewhat problematic. In general,
>   it's okay, but the second half of the last sentence
>   ("nor the process of passing a Human Readable Resource 
> Identifier to a
>    process or software component responsible for 
> dereferencing it SHOULD
>    trigger percent encoding") may suggest that resolution 
> interfaces come
>    with three different entry points. I think it would be 
> better to have
>    done this work by the XML side when resolving something. 

The above quoted phrase is in the XLink 1.1 CR, but we are not
sure at this time exactly why it is in there.

We are discussing this and will try to figure out what to do
about this wording and let you know.

Received on Wednesday, 20 June 2007 20:12:15 UTC

This archive was generated by hypermail 2.3.1 : Tuesday, 6 January 2015 20:32:26 UTC