- From: Mark Davis ☕ <mark@macchiato.com>
- Date: Tue, 5 Jul 2011 15:29:51 -0700
- To: "Phillips, Addison" <addison@lab126.com>
- Cc: Chris Weber <chris@lookout.net>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
- Message-ID: <CAJ2xs_FA8zDH=cMkof+5jg4c+6uP=PtMTpZn2JDSp0itqz=sfA@mail.gmail.com>
A couple of quick comments. NB: for a cleaner copy: http://tools.ietf.org/html/draft-weber-iri-guidelines NFC is useful for more than just matching, but I agree that it shouldn't be applied; some servers will be looking for non-NFC fields and fail. Also: 4. Replace each entity references with its corresponding character. This can't be done until *after* the fields of an IRI are parsed out. Example: in a path, you don't want an escaped / or # or ? to be transformed until after you've parsed out the path. Mark *— Il meglio è l’inimico del bene —* On Tue, Jul 5, 2011 at 13:33, Phillips, Addison <addison@lab126.com> wrote: > Hi Chris, > > Thank you for this document. I have a few comments, which follow: > > 1. Section 4, item 1. Unicode whitespace includes additional characters > other than the ones listed here (or in draft-3987bis). I think the choice of > characters here is deliberate, but it might be wise to say something about > it. Perhaps a note that says: "Remove leading and trailing instances of > ASCII whitespace..." and followed by "Note that other Unicode whitespace and > control characters are not affected by this rule." > > 2. Section 4, item 2. Replacing blocks of contiguous whitespace with a > single %20 is imprecise (for the same reason as my first comment). > Presumably multiple unquoted non-terminal whitespace characters in an IRI > represent an error of some sort. But would this be a valid IRI: " > http://example.com?value=%20%20foo%20%20bar"? (I have %20'd multiple > whitespace items for visibility). > > 3. Section 4, item 3. Why UTF-8? Wouldn't a sequence of Unicode code points > be preferable at this stage? UTF-8 is only necessary when converting to a > URI. > > 4. Section 4, item 4. "entity references" -> "entity reference". > > 5. Section 4, item 4. What does "entity reference" mean here? I can't find > it as a formally defined term in any of the IRI documents. I know what it > means in e.g. an HTML context. Should I assume that it means "local transfer > encodings", such as HTML entities in an HTML document? Or should I assume it > means IRI's own percent-encoding? > > 6. Note that not every entity reference (assuming for a moment that we mean > percent-encoding) can be so replaced? Perhaps: "Replace each entity that > references a Unicode character with its corresponding character. Any > remaining entities encode octets." > > 7. Section 4, item 5. Is NFC desirable here? Do we need to consider path > elements separately? Applying normalization blindly to the entire string > risks altering information that may be desirable later. For example, it > prevents including a denormalized query string, which may be generated by a > user on purpose. The use of Unicode normalization might be better limited > to: > > - IRI elements, such as authority, that require it inherently (but then we > don't need to specify it here?) > - comparison of path elements or IRIs for identity > > There is considerable discussion at W3C right now about Unicode > Normalization in document formats. My sense is that NFC will *not* be a > requirement elsewhere in the Web ecosystem. Perhaps requiring it for IRI > pre-processing is inconsistent? The real question is whether any later > processing is harmed by not performing the normalization. None of the > remaining IRI processing steps appear to be affected by applying (or not) > NFC---in fact I think that denormalized strings should parse in a manner > identical to normalized ones if possible. > > NFC really only helps with identity/matching processing, as far as I can > tell. I'm not saying it's not important. Only that it might be wise to limit > its application. > > Thanks, > > Addison > > Addison Phillips > Globalization Architect (Lab126) > Chair (W3C I18N WG) > > Internationalization is not a feature. > It is an architecture. > > > > -----Original Message----- > > From: public-iri-request@w3.org [mailto:public-iri-request@w3.org] On > Behalf > > Of Chris Weber > > Sent: Tuesday, July 05, 2011 12:37 PM > > To: PUBLIC-IRI@W3.ORG > > Subject: reviewing draft-weber-iri-guidelines-00 > > > > Hello all, I put out an early draft as an effort to address some of the > topics > > mentioned in my message from < > http://lists.w3.org/Archives/Public/public- > > iri/2011May/0036.html>. > > > > The draft is available at > > <http://datatracker.ietf.org/doc/draft-weber-iri-guidelines/> > > > > It's missing a lot and any feedback would be welcome. > > > > Best regards, > > Chris > >
Received on Tuesday, 5 July 2011 22:30:29 UTC