Comments on draft-iab-identifier-comparison-07 from Bjoern Hoehrmann on 2013-02-19 (www-archive@w3.org from February 2013)

From: Bjoern Hoehrmann <derhoermi@gmx.net>
Date: Tue, 19 Feb 2013 20:15:38 +0100
To: iab@iab.org
Message-ID: <i1i7i81v94ais33g0lehf91e6g5l9rofsi@hive.bjoern.hoehrmann.de>

Hi,

  Re <http://tools.ietf.org/html/draft-iab-identifier-comparison-07>, in
section 3.3 there is

   Also, when a URI is embedded in plain text (e.g., an email message),
   there is an additional concern because there is no termination
   criterion for a URI.  For example, consider
   http://unicode.org/cldr/utility/list-unicodeset.jsp?a=a&amp;g=gc.
   Some applications that detect URIs will stop before the first '.' in
   the path, while others go to last '.', and yet others may stop at the
   ';'.  As another point of comparison, Section 2.37 of [EE] (a
   standard for history citations) specifies the use of a space after a
   URI and before the punctuation.

It's unclear to me whether the `&amp;` in there is intentional or an en-
coding error. If it is intentional, that should be made very explicit. I
also find the claim a bit dubious, STD 66 quite clearly recommends using
<> around them and you could use white space aswell. More generally this
seems to be a bit far-fetched as an issue in "comparison", this is more
discussing applying heuristics to extract data from ambiguous text. Per-
haps the document can do without this paragraph.

Section 3.1 on hostnames seems to be missing the issue of "example.com"
versus "example.com." with a trailing full stop; it might be useful to
mention it there.

In section 3.3.2.3.,

   [RFC3986] defines the userinfo production that allows arbitrary data
   about the user of the URI to be placed before '@' signs in URIs.  For
   example: "http://alice:bob:chuck@example.com/bar" has the value
   "alice:bob:chuck" as its userinfo. [...]

This is somewhat misleading as it fails to mention that while the
generic syntax allows this, individual schemes like the HTTP scheme, as
currently defined in RFC 2616, do not allow this. It might be better to
pick a scheme that actually allows this form.

Section 3.3.3,

   [RFC3986] supports the use of path segment values such as "./" or
   "../" for relative URIs.  Strictly speaking, including such path
   segment values in a fully qualified URI is syntactically illegal but
   [RFC3986] section 4.1 nevertheless defines an algorithm to remove
   them.

This should include a reference to STD 66 indicating where it defines
them as illegal (I could not find that myself, so the text might be
mistaken).

The reference [TR36] should link to http://www.unicode.org/reports/tr36/
or some other suitable address (currently it does not link anything).

regards,
-- 
Björn Höhrmann · mailto:bjoern@hoehrmann.de · http://bjoern.hoehrmann.de
Am Badedeich 7 · Telefon: +49(0)160/4415681 · http://www.bjoernsworld.de
25899 Dagebüll · PGP Pub. KeyID: 0xA4357E78 · http://www.websitedev.de/

Received on Tuesday, 19 February 2013 19:16:08 UTC