- From: Mykyta Yevstifeyev <evnikita2@gmail.com>
- Date: Wed, 06 Jul 2011 12:58:06 +0300
- To: public-iri@w3.org, chris@lookout.net
05.07.2011 22:37, Chris Weber wrote:
> Hello all, I put out an early draft as an effort to address some of
> the topics mentioned in my message from
> <http://lists.w3.org/Archives/Public/public-iri/2011May/0036.html>.
>
> The draft is available at
> <http://datatracker.ietf.org/doc/draft-weber-iri-guidelines/>
Chris,
Section 4. I support the remark from Addison regarding use of UTF-8. I
suppose SHOULD should be used with UTF-8 (sorry for pun), but MAY for
UTF-16/other Unicode encodings, if needed. I also agree with the
comment regarding bullet 4, about "entity reference"s and replacing them
with characters while pre-processing, as identified by Addison and Mark.
Section 5.2.1 says:
> Split the userinfo on the first occurrence of ":" U+003A if it
> exists. The part before the ":" is the username and the part after
> is the password. The ":" being split on is not included in either
> result.
>
> <etc.>
However, as per RFC 3986, it isn't valid to identify the "user:pass"
part in URIs (and IRIs) for all schemes, as well as the only user name,
if, as you assume, no ":' is present. It is only possible when handling
an IRI in scheme-specific manner, eg. 'ftp' URIs/IRIs. Some schemes may
also define authentication information part, such as 'pop' URIs/IRIs
(RFC 2384), which would be assumed to be a username under your algorithm.
There are a number of occurrences of "relative reference" whereas it
says nothing about processing relative IRIs. If rules of RFC 3986 are
used, this should be mentioned.
Section 6.4 is going to specify the scheme-specific processing of 'file'
URIs, which are not properly specified. I recall some discussions on
file URIs in the end of 2010 on URI@w3c.org, which is the most current.
There had been a number of such discussions before. A number of
complexities were identified, which almost make impossible specifying
the scheme. Considering this, I recommend to skip this section.
Section 6.5 is almost in the same situation. I'm currently working on
'ftp' URI scheme specification
(https://datatracker.ietf.org/doc/draft-yevstifeyev-ftp-uri-scheme/).
So there will probably be a need to align these two drafts; however,
currently the ftp URI draft has no provisions regarding ftp IRIs,
allowance of UCS chars in ftp *RIs. This may probably be addressed in
the further versions of the draft; but until the ftp URI scheme isn't
properly specified by RFC I don't see sense in making up the
scheme-specific IRI parsing for such *RIs.
From Section 7.1: I find confusing the following sequence: (1)
precent-encode -> (2) UTF-8 encode -> (3) percent-encode {fpr the 2nd
time!}. I suppose everything percent-encoded is already allowed in
URIs, so 1st "percent-encode" should be skipped and the following
sequence should be formed: (1) UTF-8 encode -> (2) percent-encode chars
which are not allowed in particular URI part within such part.
Several minor/editorial/non-substantial comments. (1) All ABNF
production should be enclosed in "<" and ">", as recommended by RFC
5234. (2) Should your draft update RFC 3987 (or RFC 3987bis)? (3)
References to IDNA and punycode specifications are missing in Section
5.2.2. (4) I suppose RFC 3987bis should be normative reference in the
draft. (5a) There is no explanation of U+HHHH notation used in your
document. Even though it's considered that the reader is familiar with
it, clarifying won't be extra. (5b) Moreover, RFC 5137, BCP 137 did
officially recommend to use \u'HHHH'. (6) The references are not in the
common format (even though we may leave this issue to RFC Editor).
I hope my comments were useful.
Thanks,
Mykyta Yevstifeyev
>
> It's missing a lot and any feedback would be welcome.
>
> Best regards,
> Chris
>
>
Received on Wednesday, 6 July 2011 09:58:04 UTC