W3C home > Mailing lists > Public > public-iri@w3.org > July 2009

Re: [Widget URI] Internationalization, widget IRI?

From: Martin J. Dürst <duerst@it.aoyama.ac.jp>
Date: Mon, 27 Jul 2009 15:40:44 +0900
Message-ID: <4A6D4BEC.7090304@it.aoyama.ac.jp>
To: Larry Masinter <masinter@adobe.com>
CC: Marcin Hanclik <Marcin.Hanclik@access-company.com>, "PUBLIC-IRI@W3.ORG" <PUBLIC-IRI@w3.org>
To be more explicit, there are two things here:
URI/IRI scheme names and URI/IRI schemes.

As for scheme names (which Larry discusses below), all scheme names are 
ASCII only. Over the long years of working on the IRI spec, the idea of 
allowing non-ASCII scheme names has occasionally come up, but readily 
been rejected. Proposals were mostly for scheme name aliases in 
non-Latin scripts, not for genuinely different, new, schemes that would 
have non-ASCII names. Keeping these non-Latin aliases updated correctly 
would be mostly a maintenance/update nightmare.

As for schemes, there is inherently no difference between URI schemes 
(schemes that can be used with URIs) and IRI schemes (schemes that can 
be used with IRIs). Every URI scheme is an IRI scheme, and every IRI 
scheme is an URI scheme. For scheme definitions, most of them are 
written in terms of URIs, and if and where they allow %HH syntax with 
UTF-8 encoding, they can also be used with genuine (i.e. non-URI) IRIs. 
On the other hand, it's also possible to define the scheme-specific 
syntax on the IRI level, and rely on the IRI spec to define the 
corresponding syntax on the URI level. I think 
http://www.ietf.org/rfc/rfc4622.txt (XMPP URIs/IRIs) is essentially 
written that way.
Depending on the overall syntax and functionality of the scheme, one way 
or the other of defining it may be clearer and easier to write and read.

Regards,    Martin.

On 2009/07/27 0:11, Larry Masinter wrote:
> I'm sorry, I was really unclear. I was only talking about
> the "scheme" component. And all I'm thinking about adding
> is to point out, in the introductory material, that while
> non-roman scripts (Chinese, Japanese, Korean) are supported
> by IRIs in many of the components, the scheme itself
> remains restricted to ASCII letters a-z, digits, and three
> punctuation marks (+ . -).
>
> Larry
> --
> http://larry.masinter.net
>
>
> -----Original Message-----
> From: Marcin Hanclik [mailto:Marcin.Hanclik@access-company.com]
> Sent: Sunday, July 26, 2009 4:29 AM
> To: Larry Masinter
> Cc: PUBLIC-IRI@W3.ORG; public-webapps@w3.org
> Subject: RE: [Widget URI] Internationalization, widget IRI?
>
> Hi Larry,
>
> Thanks for your prompt comments.
>
>>> I'm not sure that the draft makes
>>> this clear
> It seems not.
>
> When reading the IRI grammar, I derive that "%" is valid there:
>
> http://tools.ietf.org/html/rfc3987#section-2.2
>
> pct-encoded    = "%" HEXDIG HEXDIG
> ipchar         = iunreserved / pct-encoded / sub-delims / ":"
>                    / "@"
> isegment       = *ipchar
> ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ]
> ihier-part     = "//" iauthority ipath-abempty
>                    / ipath-absolute
>                    / ipath-rootless
>                    / ipath-empty
> IRI            = scheme ":" ihier-part [ "?" iquery ]
>                           [ "#" ifragment ]
> (this is just one of a few possibilities to have % in IRI).
>
> Also this is the text from RFC3987:
> "Example: Trying to validate the Web page at
>        http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
>        http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.
>        example.org"
> as found here: http://tools.ietf.org/html/rfc3987#section-3.1.
> This example includes % in iquery part of IRI.
> If found the same parts (ABNF and example) in the latest draft:
> http://tools.ietf.org/html/draft-duerst-iri-bis-06#section-2.2
>
> Thanks.
>
> Kind regards,
> Marcin
>
> ________________________________________
> From: Larry Masinter [masinter@adobe.com]
> Sent: Sunday, July 26, 2009 1:44 AM
> To: Marcin Hanclik
> Cc: PUBLIC-IRI@W3.ORG
> Subject: RE: [Widget URI] Internationalization, widget IRI?
>
> (BCC original mailing lists, directing traffic to "Public-iri@w3.org"
> for IRI issues.
>
> There are no "IRI schemes". I'm not sure that the draft makes
> this clear, or makes clear that although most other parts of
> the IRI syntax extend URI syntax, the scheme is the same, cannot
> contain any %xx encoded characters because it cannot contain %,
> etc.
>
> -----Original Message-----
> From: public-pkg-uri-scheme-request@w3.org On Behalf Of Marcin Hanclik
> Sent: Friday, July 24, 2009 9:37 AM
> To: public-webapps@w3.org
> Cc: public-pkg-uri-scheme@w3.org
> Subject: [Widget URI] Internationalization, widget IRI?
>
> Hi Robin, All,
>
> Why is the Widgets 1.0: URI Scheme about URI and not IRI?
>
> Widgets 1.0 P&C is using only the term/type IRI (URI cannot be found there), e.g. for id, href and name attributes.
> In Widgets 1.0: URI Scheme (WUS?) document you refer in [1] to zip-rel-path.
> It resembles IRI per design, since conversion of the file name field [2], that may be specified in UTF-8, to URI would entail "percent-encoding" [3].
> Thus having IRI instead of URI could save processing time/power.
> It seems [4] already touches upon the internationalization.
>
> Specifically the ABNF:
>
> widget-URI  = "widget:" "//" [ authority ] "/" zip-rel-path [ "?" query ] [ "#" fragment ]
>
> is incorrect (depending on whether you are on byte or character level), because zip-rel-path includes non-percent-encoded characters, thus widget-URI is actually an IRI.
>
> What then about naming the specification as "Widgets 1.0: IRI Scheme" and referring to IRIs?
>
> Thanks.
>
> Kind regards,
> Marcin
>
> [1] http://dev.w3.org/2006/waf/widgets-uri/#syntax
> [2] http://dev.w3.org/2006/waf/widgets/#file-name-field0
> [3] http://tools.ietf.org/html/rfc3986#section-2.1
> [4] http://lists.w3.org/Archives/Public/public-pkg-uri-scheme/2009Jan/0000.html
> Marcin Hanclik
> ACCESS Systems Germany GmbH
> Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
> Mobile: +49-163-8290-646
> E-Mail: marcin.hanclik@access-company.com
>
>
> ________________________________________
>
> Access Systems Germany GmbH
> Essener Strasse 5  |  D-46047 Oberhausen
> HRB 13548 Amtsgericht Duisburg
> Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda
>
> www.access-company.com
>
> CONFIDENTIALITY NOTICE
> This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
> individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
> If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
>
>
> ________________________________________
>
> Access Systems Germany GmbH
> Essener Strasse 5  |  D-46047 Oberhausen
> HRB 13548 Amtsgericht Duisburg
> Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda
>
> www.access-company.com
>
> CONFIDENTIALITY NOTICE
> This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
> individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
> If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
>
>

-- 
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp
Received on Monday, 27 July 2009 06:41:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Monday, 30 April 2012 19:51:55 GMT