Re: reviewing draft-weber-iri-guidelines-00

05.07.2011 22:37, Chris Weber wrote:
> Hello all, I put out an early draft as an effort to address some of 
> the topics mentioned in my message from 
> <http://lists.w3.org/Archives/Public/public-iri/2011May/0036.html>.
>
> The draft is available at
> <http://datatracker.ietf.org/doc/draft-weber-iri-guidelines/>
Chris,

Section 4.  I support the remark from Addison regarding use of UTF-8.  I 
suppose SHOULD should be used with UTF-8 (sorry for pun), but MAY for 
UTF-16/other Unicode encodings, if needed.  I also agree with the 
comment regarding bullet 4, about "entity reference"s and replacing them 
with characters while pre-processing, as identified by Addison and Mark.

Section 5.2.1 says:

>     Split the userinfo on the first occurrence of ":" U+003A if it
>     exists.  The part before the ":" is the username and the part after
>     is the password.  The ":" being split on is not included in either
>     result.
>
>     <etc.>
However, as per RFC 3986, it isn't valid to identify the "user:pass" 
part in URIs (and IRIs) for all schemes, as well as the only user name, 
if, as you assume, no ":' is present.  It is only possible when handling 
an IRI in scheme-specific manner, eg. 'ftp' URIs/IRIs.  Some schemes may 
also define authentication information part, such as 'pop' URIs/IRIs 
(RFC 2384), which would be assumed to be a username under your algorithm.

There are a number of occurrences of "relative reference" whereas it 
says nothing about processing relative IRIs.  If rules of RFC 3986 are 
used, this should be mentioned.

Section 6.4 is going to specify the scheme-specific processing of 'file' 
URIs, which are not properly specified.  I recall some discussions on 
file URIs in the end of 2010 on URI@w3c.org, which is the most current.  
There had been a number of such discussions before.  A number of 
complexities were identified, which almost make impossible specifying 
the scheme.  Considering this, I recommend to skip this section.

Section 6.5 is almost in the same situation.  I'm currently working on 
'ftp' URI scheme specification 
(https://datatracker.ietf.org/doc/draft-yevstifeyev-ftp-uri-scheme/).  
So there will probably be a need to align these two drafts; however, 
currently the ftp URI draft has no provisions regarding ftp IRIs, 
allowance of UCS chars in ftp *RIs.  This may probably be addressed in 
the further versions of the draft; but until the ftp URI scheme isn't 
properly specified by RFC I don't see sense in making up the 
scheme-specific IRI parsing for such *RIs.

 From Section 7.1: I find confusing the following sequence: (1) 
precent-encode -> (2) UTF-8 encode -> (3) percent-encode {fpr the 2nd 
time!}.  I suppose everything percent-encoded is already allowed in 
URIs, so 1st "percent-encode" should be skipped and the following 
sequence should be formed: (1) UTF-8 encode -> (2) percent-encode chars 
which are not allowed in particular URI part within such part.

Several minor/editorial/non-substantial comments.  (1) All ABNF 
production should be enclosed in "<" and ">", as recommended by RFC 
5234. (2) Should your draft update RFC 3987 (or RFC 3987bis)? (3) 
References to IDNA and punycode specifications are missing in Section 
5.2.2. (4) I suppose RFC 3987bis should be normative reference in the 
draft. (5a) There is no explanation of U+HHHH notation used in your 
document.  Even though it's considered that the reader is familiar with 
it, clarifying won't be extra. (5b) Moreover, RFC 5137, BCP 137 did 
officially recommend to use \u'HHHH'. (6) The references are not in the 
common format (even though we may leave this issue to RFC Editor).

I hope my comments were useful.

Thanks,
Mykyta Yevstifeyev
>
> It's missing a lot and any feedback would be welcome.
>
> Best regards,
> Chris
>
>

Received on Wednesday, 6 July 2011 09:58:04 UTC