W3C home > Mailing lists > Public > public-iri@w3.org > July 2011

Re: reviewing draft-weber-iri-guidelines-00

From: Mark Davis ☕ <mark@macchiato.com>
Date: Tue, 5 Jul 2011 15:29:51 -0700
Message-ID: <CAJ2xs_FA8zDH=cMkof+5jg4c+6uP=PtMTpZn2JDSp0itqz=sfA@mail.gmail.com>
To: "Phillips, Addison" <addison@lab126.com>
Cc: Chris Weber <chris@lookout.net>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
A couple of quick comments.

NB: for a cleaner copy:

NFC is useful for more than just matching, but I agree that it shouldn't be
applied; some servers will be looking for non-NFC fields and fail.


   4.  Replace each entity references with its corresponding character.

This can't be done until *after* the fields of an IRI are parsed out.
Example: in a path, you don't want an escaped / or # or ? to be transformed
until after you've parsed out the path.

*— Il meglio è l’inimico del bene —*

On Tue, Jul 5, 2011 at 13:33, Phillips, Addison <addison@lab126.com> wrote:

> Hi Chris,
> Thank you for this document. I have a few comments, which follow:
> 1. Section 4, item 1. Unicode whitespace includes additional characters
> other than the ones listed here (or in draft-3987bis). I think the choice of
> characters here is deliberate, but it might be wise to say something about
> it. Perhaps a note that says: "Remove leading and trailing instances of
> ASCII whitespace..." and followed by "Note that other Unicode whitespace and
> control characters are not affected by this rule."
> 2. Section 4, item 2. Replacing blocks of contiguous whitespace with a
> single %20 is imprecise (for the same reason as my first comment).
> Presumably multiple unquoted non-terminal whitespace characters in an IRI
> represent an error of some sort. But would this be a valid IRI: "
> http://example.com?value=%20%20foo%20%20bar"? (I have %20'd multiple
> whitespace items for visibility).
> 3. Section 4, item 3. Why UTF-8? Wouldn't a sequence of Unicode code points
> be preferable at this stage? UTF-8 is only necessary when converting to a
> URI.
> 4. Section 4, item 4. "entity references" -> "entity reference".
> 5. Section 4, item 4. What does "entity reference" mean here? I can't find
> it as a formally defined term in any of the IRI documents. I know what it
> means in e.g. an HTML context. Should I assume that it means "local transfer
> encodings", such as HTML entities in an HTML document? Or should I assume it
> means IRI's own percent-encoding?
> 6. Note that not every entity reference (assuming for a moment that we mean
> percent-encoding) can be so replaced? Perhaps: "Replace each entity that
> references a Unicode character with its corresponding character. Any
> remaining entities encode octets."
> 7. Section 4, item 5. Is NFC desirable here? Do we need to consider path
> elements separately? Applying normalization blindly to the entire string
> risks altering information that may be desirable later. For example, it
> prevents including a denormalized query string, which may be generated by a
> user on purpose. The use of Unicode normalization might be better limited
> to:
> - IRI elements, such as authority, that require it inherently (but then we
> don't need to specify it here?)
> - comparison of path elements or IRIs for identity
> There is considerable discussion at W3C right now about Unicode
> Normalization in document formats. My sense is that NFC will *not* be a
> requirement elsewhere in the Web ecosystem. Perhaps requiring it for IRI
> pre-processing is inconsistent? The real question is whether any later
> processing is harmed by not performing the normalization. None of the
> remaining IRI processing steps appear to be affected by applying (or not)
> NFC---in fact I think that denormalized strings should parse in a manner
> identical to normalized ones if possible.
> NFC really only helps with identity/matching processing, as far as I can
> tell. I'm not saying it's not important. Only that it might be wise to limit
> its application.
> Thanks,
> Addison
> Addison Phillips
> Globalization Architect (Lab126)
> Chair (W3C I18N WG)
> Internationalization is not a feature.
> It is an architecture.
> > -----Original Message-----
> > From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On
> Behalf
> > Of Chris Weber
> > Sent: Tuesday, July 05, 2011 12:37 PM
> > Subject: reviewing draft-weber-iri-guidelines-00
> >
> > Hello all, I put out an early draft as an effort to address some of the
> topics
> > mentioned in my message from <
> http://lists.w3.org/Archives/Public/public-
> > iri/2011May/0036.html>.
> >
> > The draft is available at
> > <http://datatracker.ietf.org/doc/draft-weber-iri-guidelines/>
> >
> > It's missing a lot and any feedback would be welcome.
> >
> > Best regards,
> > Chris
Received on Tuesday, 5 July 2011 22:30:29 UTC

This archive was generated by hypermail 2.4.0 : Friday, 17 January 2020 16:14:42 UTC