Terminology, IRI/URI classification, was: RE: [Widget URI] Internationalization, widget IRI?

Hi Martin,

Thanks for all the information.
I have one more comment a) and one question b)
a) Terminology
b) IRI/URI classification

a)
I have reviewed the IRI/URI/URL RFC chain searching for the explanation for my confusion around scheme and scheme name.
I came to a conclusion that:
1. URI/IRI scheme = URI/IRI syntax
2. the grammar in URI/IRI specifications could be updated.
It seems it started in RFC1738.
<scheme>:<scheme-specific-part>
where <scheme> meant actually something like <scheme-name>.

Similarly RFC3986 and RFC3987 have a rule:
scheme      = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )

Could this be changed to
scheme-name = ALPHA *( ALPHA / DIGIT / "+" / "-" / "." )
?
Then, respectively, the rules URI and IRI would have to be updated.

I am not sure whether this is feasible and how much other specs it could break, but I assume this could help avoid some mis-readings of the specs.
I know it is a detail and I can live with it as it is anyway.

b)
I assume now that IRI scheme exists as a term.
I understand that there is a dualism between URIs and IRIs, i.e. one can be expressed in the syntax of another based on rules from RFC3987.

Let's imagine that I define some new scheme, but I do not know yet whether it will be primarily URI scheme or IRI scheme.
Is my understanding correct then, that if the syntax assumes only US-ASCII characters then my new definition will be about some URI scheme and if the syntax would include more characters, e.g. 8-bit UTF-8 characters or would generally operate on Unicode characters and not octets, then it would be about some IRI scheme?

My question is motivated by the fact, that each symbol/octet of the RFC3986's URI is not more than 7-bit, whereas symbols of the RFC3987 - when e.g. encoded in UTF-8 - would require at least 8-bits from some octets.

Thanks.

Kind regards,
Marcin

Thanks.

Kind regards,
Marcin

Marcin Hanclik
ACCESS Systems Germany GmbH
Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
Mobile: +49-163-8290-646
E-Mail: marcin.hanclik@access-company.com

-----Original Message-----
From: "Martin J. Dürst" [mailto:duerst@it.aoyama.ac.jp]
Sent: Monday, July 27, 2009 8:41 AM
To: Larry Masinter
Cc: Marcin Hanclik; PUBLIC-IRI@W3.ORG
Subject: Re: [Widget URI] Internationalization, widget IRI?

To be more explicit, there are two things here:
URI/IRI scheme names and URI/IRI schemes.

As for scheme names (which Larry discusses below), all scheme names are
ASCII only. Over the long years of working on the IRI spec, the idea of
allowing non-ASCII scheme names has occasionally come up, but readily
been rejected. Proposals were mostly for scheme name aliases in
non-Latin scripts, not for genuinely different, new, schemes that would
have non-ASCII names. Keeping these non-Latin aliases updated correctly
would be mostly a maintenance/update nightmare.

As for schemes, there is inherently no difference between URI schemes
(schemes that can be used with URIs) and IRI schemes (schemes that can
be used with IRIs). Every URI scheme is an IRI scheme, and every IRI
scheme is an URI scheme. For scheme definitions, most of them are
written in terms of URIs, and if and where they allow %HH syntax with
UTF-8 encoding, they can also be used with genuine (i.e. non-URI) IRIs.
On the other hand, it's also possible to define the scheme-specific
syntax on the IRI level, and rely on the IRI spec to define the
corresponding syntax on the URI level. I think
http://www.ietf.org/rfc/rfc4622.txt (XMPP URIs/IRIs) is essentially
written that way.
Depending on the overall syntax and functionality of the scheme, one way
or the other of defining it may be clearer and easier to write and read.

Regards,    Martin.

On 2009/07/27 0:11, Larry Masinter wrote:
> I'm sorry, I was really unclear. I was only talking about
> the "scheme" component. And all I'm thinking about adding
> is to point out, in the introductory material, that while
> non-roman scripts (Chinese, Japanese, Korean) are supported
> by IRIs in many of the components, the scheme itself
> remains restricted to ASCII letters a-z, digits, and three
> punctuation marks (+ . -).
>
> Larry
> --
> http://larry.masinter.net
>
>
> -----Original Message-----
> From: Marcin Hanclik [mailto:Marcin.Hanclik@access-company.com]
> Sent: Sunday, July 26, 2009 4:29 AM
> To: Larry Masinter
> Cc: PUBLIC-IRI@W3.ORG; public-webapps@w3.org
> Subject: RE: [Widget URI] Internationalization, widget IRI?
>
> Hi Larry,
>
> Thanks for your prompt comments.
>
>>> I'm not sure that the draft makes
>>> this clear
> It seems not.
>
> When reading the IRI grammar, I derive that "%" is valid there:
>
> http://tools.ietf.org/html/rfc3987#section-2.2
>
> pct-encoded    = "%" HEXDIG HEXDIG
> ipchar         = iunreserved / pct-encoded / sub-delims / ":"
>                    / "@"
> isegment       = *ipchar
> ipath-absolute = "/" [ isegment-nz *( "/" isegment ) ]
> ihier-part     = "//" iauthority ipath-abempty
>                    / ipath-absolute
>                    / ipath-rootless
>                    / ipath-empty
> IRI            = scheme ":" ihier-part [ "?" iquery ]
>                           [ "#" ifragment ]
> (this is just one of a few possibilities to have % in IRI).
>
> Also this is the text from RFC3987:
> "Example: Trying to validate the Web page at
>        http://r&#xE9;sum&#xE9;.example.org would lead to an IRI of
>        http://validator.w3.org/check?uri=http%3A%2F%2Fr&#xE9;sum&#xE9;.
>        example.org"
> as found here: http://tools.ietf.org/html/rfc3987#section-3.1.
> This example includes % in iquery part of IRI.
> If found the same parts (ABNF and example) in the latest draft:
> http://tools.ietf.org/html/draft-duerst-iri-bis-06#section-2.2
>
> Thanks.
>
> Kind regards,
> Marcin
>
> ________________________________________
> From: Larry Masinter [masinter@adobe.com]
> Sent: Sunday, July 26, 2009 1:44 AM
> To: Marcin Hanclik
> Cc: PUBLIC-IRI@W3.ORG
> Subject: RE: [Widget URI] Internationalization, widget IRI?
>
> (BCC original mailing lists, directing traffic to "Public-iri@w3.org"
> for IRI issues.
>
> There are no "IRI schemes". I'm not sure that the draft makes
> this clear, or makes clear that although most other parts of
> the IRI syntax extend URI syntax, the scheme is the same, cannot
> contain any %xx encoded characters because it cannot contain %,
> etc.
>
> -----Original Message-----
> From: public-pkg-uri-scheme-request@w3.org On Behalf Of Marcin Hanclik
> Sent: Friday, July 24, 2009 9:37 AM
> To: public-webapps@w3.org
> Cc: public-pkg-uri-scheme@w3.org
> Subject: [Widget URI] Internationalization, widget IRI?
>
> Hi Robin, All,
>
> Why is the Widgets 1.0: URI Scheme about URI and not IRI?
>
> Widgets 1.0 P&C is using only the term/type IRI (URI cannot be found there), e.g. for id, href and name attributes.
> In Widgets 1.0: URI Scheme (WUS?) document you refer in [1] to zip-rel-path.
> It resembles IRI per design, since conversion of the file name field [2], that may be specified in UTF-8, to URI would entail "percent-encoding" [3].
> Thus having IRI instead of URI could save processing time/power.
> It seems [4] already touches upon the internationalization.
>
> Specifically the ABNF:
>
> widget-URI  = "widget:" "//" [ authority ] "/" zip-rel-path [ "?" query ] [ "#" fragment ]
>
> is incorrect (depending on whether you are on byte or character level), because zip-rel-path includes non-percent-encoded characters, thus widget-URI is actually an IRI.
>
> What then about naming the specification as "Widgets 1.0: IRI Scheme" and referring to IRIs?
>
> Thanks.
>
> Kind regards,
> Marcin
>
> [1] http://dev.w3.org/2006/waf/widgets-uri/#syntax
> [2] http://dev.w3.org/2006/waf/widgets/#file-name-field0
> [3] http://tools.ietf.org/html/rfc3986#section-2.1
> [4] http://lists.w3.org/Archives/Public/public-pkg-uri-scheme/2009Jan/0000.html
> Marcin Hanclik
> ACCESS Systems Germany GmbH
> Tel: +49-208-8290-6452  |  Fax: +49-208-8290-6465
> Mobile: +49-163-8290-646
> E-Mail: marcin.hanclik@access-company.com
>
>
> ________________________________________
>
> Access Systems Germany GmbH
> Essener Strasse 5  |  D-46047 Oberhausen
> HRB 13548 Amtsgericht Duisburg
> Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda
>
> www.access-company.com
>
> CONFIDENTIALITY NOTICE
> This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
> individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
> If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
>
>
> ________________________________________
>
> Access Systems Germany GmbH
> Essener Strasse 5  |  D-46047 Oberhausen
> HRB 13548 Amtsgericht Duisburg
> Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda
>
> www.access-company.com
>
> CONFIDENTIALITY NOTICE
> This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
> individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
> If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.
>
>

--
#-# Martin J. Dürst, Professor, Aoyama Gakuin University
#-# http://www.sw.it.aoyama.ac.jp   mailto:duerst@it.aoyama.ac.jp

________________________________________

Access Systems Germany GmbH
Essener Strasse 5  |  D-46047 Oberhausen
HRB 13548 Amtsgericht Duisburg
Geschaeftsfuehrer: Michel Piquemal, Tomonori Watanabe, Yusuke Kanda

www.access-company.com

CONFIDENTIALITY NOTICE
This e-mail and any attachments hereto may contain information that is privileged or confidential, and is intended for use only by the
individual or entity to which it is addressed. Any disclosure, copying or distribution of the information by anyone else is strictly prohibited.
If you have received this document in error, please notify us promptly by responding to this e-mail. Thank you.

Received on Monday, 27 July 2009 16:14:31 UTC