- From: Marcin Hanclik <Marcin.Hanclik@access-company.com>
- Date: Mon, 27 Jul 2009 18:01:01 +0200
- To: "\"Martin J. Dürst\"" <duerst@it.aoyama.ac.jp>, Larry Masinter <masinter@adobe.com>
- CC: "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
Hi Martin, Thanks for all the information. I have one more comment a) and one question b) a) Terminology b) IRI/URI classification a) I have reviewed the IRI/URI/URL RFC chain searching for the explanation for my confusion around scheme and scheme name. I came to a conclusion that: 1. URI/IRI scheme = URI/IRI syntax 2. the grammar in URI/IRI specifications could be updated. It seems it started in RFC1738. <scheme>:<scheme-specific-part> where <scheme> meant actually something like <scheme-name>. Similarly RFC3986 and RFC3987 have a rule: scheme = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) Could this be changed to scheme-name = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." ) ? Then, respectively, the rules URI and IRI would have to be updated. I am not sure whether this is feasible and how much other specs it could break, but I assume this could help avoid some mis-readings of the specs. I know it is a detail and I can live with it as it is anyway. b) I assume now that IRI scheme exists as a term. I understand that there is a dualism between URIs and IRIs, i.e. one can be expressed in the syntax of another based on rules from RFC3987. Let's imagine that I define some new scheme, but I do not know yet whether it will be primarily URI scheme or IRI scheme. Is my understanding correct then, that if the syntax assumes only US-ASCII characters then my new definition will be about some URI scheme and if the syntax would include more characters, e.g. 8-bit UTF-8 characters or would generally operate on Unicode characters and not octets, then it would be about some IRI scheme? My question is motivated by the fact, that each symbol/octet of the RFC3986's URI is not more than 7-bit, whereas symbols of the RFC3987 - when e.g. encoded in UTF-8 - would require at least 8-bits from some octets. Thanks. Kind regards, Marcin Thanks. Kind regards, Marcin Marcin Hanclik ACCESS Systems Germany GmbH Tel: +49-208-8290-6452 | Fax: +49-208-8290-6465 Mobile: +49-163-8290-646 E-Mail: marcin.hanclik@access-company.com -----Original Message----- From: "Martin J. Dürst" [mailto:duerst@it.aoyama.ac.jp] Sent: Monday, July 27, 2009 8:41 AM To: Larry Masinter Cc: Marcin Hanclik; PUBLIC-IRI@W3.ORG Subject: Re: [Widget URI] Internationalization, widget IRI? To be more explicit, there are two things here: URI/IRI scheme names and URI/IRI schemes. As for scheme names (which Larry discusses below), all scheme names are ASCII only. Over the long years of working on the IRI spec, the idea of allowing non-ASCII scheme names has occasionally come up, but readily been rejected. Proposals were mostly for scheme name aliases in non-Latin scripts, not for genuinely different, new, schemes that would have non-ASCII names. Keeping these non-Latin aliases updated correctly would be mostly a maintenance/update nightmare. As for schemes, there is inherently no difference between URI schemes (schemes that can be used with URIs) and IRI schemes (schemes that can be used with IRIs). Every URI scheme is an IRI scheme, and every IRI scheme is an URI scheme. For scheme definitions, most of them are written in terms of URIs, and if and where they allow %HH syntax with UTF-8 encoding, they can also be used with genuine (i.e. non-URI) IRIs. On the other hand, it's also possible to define the scheme-specific syntax on the IRI level, and rely on the IRI spec to define the corresponding syntax on the URI level. I think http://www.ietf.org/rfc/rfc4622.txt (XMPP URIs/IRIs) is essentially written that way. Depending on the overall syntax and functionality of the scheme, one way or the other of defining it may be clearer and easier to write and read. Regards, Martin. On 2009/07/27 0:11, Larry Masinter wrote: > I'm sorry, I was really unclear. I was only talking about > the "scheme" component. And all I'm thinking about adding > is to point out, in the introductory material, that while > non-roman scripts (Chinese, Japanese, Korean) are supported > by IRIs in many of the components, the scheme itself > remains restricted to ASCII letters a-z, digits, and three > punctuation marks (+ . -). > > Larry > -- > http://larry.masinter.net > > > -----Original Message----- > From: Marcin Hanclik [mailto:Marcin.Hanclik@access-company.com] > Sent: Sunday, July 26, 2009 4:29 AM > To: Larry Masinter > Cc: PUBLIC-IRI@W3.ORG; public-webapps@w3.org > Subject: RE: [Widget URI] Internationalization, widget IRI? > > Hi Larry, > > Thanks for your prompt comments. > >>> I'm not sure that the draft makes >>> this clear > It seems not. > > When reading the IRI grammar, I derive that "%" is valid there: > > http://tools.ietf.org/html/rfc3987#section-2.2 > > pct-encoded = "%" HEXDIG HEXDIG > ipchar = iunreserved / pct-encoded / sub-delims / ":" > / "@" > isegment = *ipchar > ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ] > ihier-part = "//" iauthority ipath-abempty > / ipath-absolute > / ipath-rootless > / ipath-empty > IRI = scheme ":" ihier-part [ "?" iquery ] > [ "#" ifragment ] > (this is just one of a few possibilities to have % in IRI). > > Also this is the text from RFC3987: > "Example: Trying to validate the Web page at > http://résumé.example.org would lead to an IRI of > http://validator.w3.org/check?uri=http%3A%2F%2Frésumé. > example.org" > as found here: http://tools.ietf.org/html/rfc3987#section-3.1. > This example includes % in iquery part of IRI. > If found the same parts (ABNF and example) in the latest draft: > http://tools.ietf.org/html/draft-duerst-iri-bis-06#section-2.2 > > Thanks. > > Kind regards, > Marcin > > ________________________________________ > From: Larry Masinter [masinter@adobe.com] > Sent: Sunday, July 26, 2009 1:44 AM > To: Marcin Hanclik > Cc: PUBLIC-IRI@W3.ORG > Subject: RE: [Widget URI] Internationalization, widget IRI? > > (BCC original mailing lists, directing traffic to "Public-iri@w3.org" > for IRI issues. > > There are no "IRI schemes". I'm not sure that the draft makes > this clear, or makes clear that although most other parts of > the IRI syntax extend URI syntax, the scheme is the same, cannot > contain any %xx encoded characters because it cannot contain %, > etc. > > -----Original Message----- > From: public-pkg-uri-scheme-request@w3.org On Behalf Of Marcin Hanclik > Sent: Friday, July 24, 2009 9:37 AM > To: public-webapps@w3.org > Cc: public-pkg-uri-scheme@w3.org > Subject: [Widget URI] Internationalization, widget IRI? > > Hi Robin, All, > > Why is the Widgets 1.0: URI Scheme about URI and not IRI? > > Widgets 1.0 P&C is using only the term/type IRI (URI cannot be found there), e.g. for id, href and name attributes. > In Widgets 1.0: URI Scheme (WUS?) document you refer in [1] to zip-rel-path. > It resembles IRI per design, since conversion of the file name field [2], that may be specified in UTF-8, to URI would entail "percent-encoding" [3]. > Thus having IRI instead of URI could save processing time/power. > It seems [4] already touches upon the internationalization. > > Specifically the ABNF: > > widget-URI = "widget:" "//" [ authority ] "/" zip-rel-path [ "?" query ] [ "#" fragment ] > > is incorrect (depending on whether you are on byte or character level), because zip-rel-path includes non-percent-encoded characters, thus widget-URI is actually an IRI. > > What then about naming the specification as "Widgets 1.0: IRI Scheme" and referring to IRIs? > > Thanks. > > Kind regards, > Marcin > > [1] http://dev.w3.org/2006/waf/widgets-uri/#syntax > [2] http://dev.w3.org/2006/waf/widgets/#file-name-field0 > [3] http://tools.ietf.org/html/rfc3986#section-2.1 > [4] http://lists.w3.org/Archives/Public/public-pkg-uri-scheme/2009Jan/0000.html > Marcin Hanclik > ACCESS Systems Germany GmbH > Tel: +49-208-8290-6452 | Fax: +49-208-8290-6465 > Mobile: +49-163-8290-646 > E-Mail: marcin.hanclik@access-company.com > > > ________________________________________ > > Access Systems Germany GmbH > Essener Strasse 5 | D-46047 Oberhausen > HRB 13548 Amtsgericht Duisburg > Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda > > www.access-company.com > > CONFIDENTIALITY NOTICE > This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the > individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited. > If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you. > > > ________________________________________ > > Access Systems Germany GmbH > Essener Strasse 5 | D-46047 Oberhausen > HRB 13548 Amtsgericht Duisburg > Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda > > www.access-company.com > > CONFIDENTIALITY NOTICE > This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the > individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited. > If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you. > > -- #-# Martin J. Dürst, Professor, Aoyama Gakuin University #-# http://www.sw.it.aoyama.ac.jp mailto:duerst@it.aoyama.ac.jp ________________________________________ Access Systems Germany GmbH Essener Strasse 5 | D-46047 Oberhausen HRB 13548 Amtsgericht Duisburg Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda www.access-company.com CONFIDENTIALITY NOTICE This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited. If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
Received on Monday, 27 July 2009 16:14:31 UTC