- From: Mykyta Yevstifeyev <evnikita2@gmail.com>
- Date: Wed, 06 Jul 2011 12:58:06 +0300
- To: public-iri@w3.org, chris@lookout.net
05.07.2011 22:37, Chris Weber wrote: > Hello all, I put out an early draft as an effort to address some of > the topics mentioned in my message from > <http://lists.w3.org/Archives/Public/public-iri/2011May/0036.html>. > > The draft is available at > <http://datatracker.ietf.org/doc/draft-weber-iri-guidelines/> Chris, Section 4. I support the remark from Addison regarding use of UTF-8. I suppose SHOULD should be used with UTF-8 (sorry for pun), but MAY for UTF-16/other Unicode encodings, if needed. I also agree with the comment regarding bullet 4, about "entity reference"s and replacing them with characters while pre-processing, as identified by Addison and Mark. Section 5.2.1 says: > Split the userinfo on the first occurrence of ":" U+003A if it > exists. The part before the ":" is the username and the part after > is the password. The ":" being split on is not included in either > result. > > <etc.> However, as per RFC 3986, it isn't valid to identify the "user:pass" part in URIs (and IRIs) for all schemes, as well as the only user name, if, as you assume, no ":' is present. It is only possible when handling an IRI in scheme-specific manner, eg. 'ftp' URIs/IRIs. Some schemes may also define authentication information part, such as 'pop' URIs/IRIs (RFC 2384), which would be assumed to be a username under your algorithm. There are a number of occurrences of "relative reference" whereas it says nothing about processing relative IRIs. If rules of RFC 3986 are used, this should be mentioned. Section 6.4 is going to specify the scheme-specific processing of 'file' URIs, which are not properly specified. I recall some discussions on file URIs in the end of 2010 on URI@w3c.org, which is the most current. There had been a number of such discussions before. A number of complexities were identified, which almost make impossible specifying the scheme. Considering this, I recommend to skip this section. Section 6.5 is almost in the same situation. I'm currently working on 'ftp' URI scheme specification (https://datatracker.ietf.org/doc/draft-yevstifeyev-ftp-uri-scheme/). So there will probably be a need to align these two drafts; however, currently the ftp URI draft has no provisions regarding ftp IRIs, allowance of UCS chars in ftp *RIs. This may probably be addressed in the further versions of the draft; but until the ftp URI scheme isn't properly specified by RFC I don't see sense in making up the scheme-specific IRI parsing for such *RIs. From Section 7.1: I find confusing the following sequence: (1) precent-encode -> (2) UTF-8 encode -> (3) percent-encode {fpr the 2nd time!}. I suppose everything percent-encoded is already allowed in URIs, so 1st "percent-encode" should be skipped and the following sequence should be formed: (1) UTF-8 encode -> (2) percent-encode chars which are not allowed in particular URI part within such part. Several minor/editorial/non-substantial comments. (1) All ABNF production should be enclosed in "<" and ">", as recommended by RFC 5234. (2) Should your draft update RFC 3987 (or RFC 3987bis)? (3) References to IDNA and punycode specifications are missing in Section 5.2.2. (4) I suppose RFC 3987bis should be normative reference in the draft. (5a) There is no explanation of U+HHHH notation used in your document. Even though it's considered that the reader is familiar with it, clarifying won't be extra. (5b) Moreover, RFC 5137, BCP 137 did officially recommend to use \u'HHHH'. (6) The references are not in the common format (even though we may leave this issue to RFC Editor). I hope my comments were useful. Thanks, Mykyta Yevstifeyev > > It's missing a lot and any feedback would be welcome. > > Best regards, > Chris > >
Received on Wednesday, 6 July 2011 09:58:04 UTC