W3C home > Mailing lists > Public > public-xml-core-wg@w3.org > June 2007

RE: Fwd: Re: HRRIs, IRIs, etc

From: Richard Tobin <richard@inf.ed.ac.uk>
Date: Wed, 20 Jun 2007 16:06:09 +0100 (BST)
To: <public-xml-core-wg@w3.org>
Message-Id: <20070620150609.3B3E522404E@macpro.inf.ed.ac.uk>

Just a quick reaction to Martin's message before the telcon:

> http://www.w3.org/TR/REC-xml/#charsets allows (although, at least
> in never versions, discourages):
> [#xFDD0-#xFDDF],
> [#x1FFFE-#x1FFFF], [#x2FFFE-#x2FFFF], [#x3FFFE-#x3FFFF],
> [#x4FFFE-#x4FFFF], [#x5FFFE-#x5FFFF], [#x6FFFE-#x6FFFF],
> [#x7FFFE-#x7FFFF], [#x8FFFE-#x8FFFF], [#x9FFFE-#x9FFFF],
> [#xAFFFE-#xAFFFF], [#xBFFFE-#xBFFFF], [#xCFFFE-#xCFFFF],
> [#xDFFFE-#xDFFFF], [#xEFFFE-#xEFFFF], [#xFFFFE-#xFFFFF],
> [#x10FFFE-#x10FFFF]
> 
> In the IRI spec, these are excluded:
>    ucschar        = %xA0-D7FF / %xF900-FDCF / %xFDF0-FFEF
>                   / %x10000-1FFFD / %x20000-2FFFD / %x30000-3FFFD
>                   / %x40000-4FFFD / %x50000-5FFFD / %x60000-6FFFD
>                   / %x70000-7FFFD / %x80000-8FFFD / %x90000-9FFFD
>                   / %xA0000-AFFFD / %xB0000-BFFFD / %xC0000-CFFFD
>                   / %xD0000-DFFFD / %xE1000-EFFFD
> 
> so you have to add them to your list in section 3.

Ok.

> - The choice of name, which is highly suggestive instead of descriptive,
>   inappropriate on several accounts (for the largest part of URIs/IRIs,
>   HRRIs are only marginally more readable, if at all, and the overall
>   syntax still poses a lot of problems for average human users (http://...).

I don't have any attachment to the name, but we have failed to come up
with a better one.  It applies most in the case of space characters,
which are often seen in place of %20 in hrefs in web pages.

> - The overall description. I note e.g. the following:
>   "However, it is often inconvenient for authors to encode these characters."
>   How often? Unless somebody is authoring a lot of XPointers by hand,
>   this can't happen that often (maybe with the exception of the space,
>   but then you discurage that (correctly!) yourself).

As you say, the space character is the most common.  I would be happy
to tone down or remove this sentence; our motivation for defining
HRRIs is not that they are a good thing but that they already exist
in multiple standards.

> - The classification as a BCP.

I don't know anything about the pros and cons od this.

> - The security section now mentions the issues with control characters.
>   This should definitely be a bit more specific, and should contain
>   explicit recommendations. I'd write that receivers may want to
>   filter out such characters, or URIs with such characters, and
>   therefore including them in the first place is discouraged.

I'm sure we can expand this.

> - You have some advice against using raw spaces ("Also, authors of HRRIs
>   are advised to percent encode space characters themselves, rather than
>   rely on the processor to do so, because spaces are often used to
>   separate HRRIs in a sequence"), but not against others, where similar
>   arguments apply:
>   - tabs and CR/LF are removed/merged/coverted to spaces in attribute values
>     (merging also occurrs for spaces)
>   - <> are often used to delimit URIs/IRIs
>   - arbitrary controls may trigger some security filter
>   - private use characters are not interoperable
>   - non-characters (the list above) are discouraged in XML itself
>   (not sure this list is complete, but I guess it's getting close)

All good points I think, especially about tabs etc.

> - The last paragraph of Section 3 is somewhat problematic. In general,
>   it's okay, but the second half of the last sentence
>   ("nor the process of passing a Human Readable Resource Identifier to a
>    process or software component responsible for dereferencing it SHOULD
>    trigger percent encoding") may suggest that resolution interfaces come
>    with three different entry points. I think it would be better to have
>    done this work by the XML side when resolving something. 

I'm not sure about this bit.  We need to make sure it's consistent
with what the existing specs say.  The idea is to save humans from
seeing %xx unnecessarily.

-- Richard
Received on Wednesday, 20 June 2007 15:20:49 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 8 January 2008 14:21:35 GMT