W3C home > Mailing lists > Public > public-i18n-core@w3.org > October to December 2008

RE: IRI

From: Richard Ishida <ishida@w3.org>
Date: Thu, 11 Dec 2008 17:54:53 -0000
To: "'Martin Duerst'" <duerst@it.aoyama.ac.jp>, "'Mark Davis'" <mark@macchiato.com>
Cc: <public-i18n-core@w3.org>
Message-ID: <00a701c95bb9$927db8c0$b7792a40$@org>

Ok.  I took into account both suggestions, and came up with the following
proposal:

[[
We mentioned that URI syntax severely limits the characters you can use in a
web address.  Web addresses that use characters from outside this range (eg.
characters from scripts such as Chinese, Cyrillic, Arabic, Devanagari,
etc.), are called IRIs (Internationalized Resource Identifiers).  Strictly
speaking, there are still some restrictions: the characters used must be in
the Unicode character set (which is very unlikely to cause any restriction
in normal usage, although, to date, something like Klingon would be
excluded), and a few special Unicode characters that would otherwise cause
problems are not allowed.  Note, however, that the term IRI simply refers to
a sequence of characters, whether on paper or in an electronic document or
application.  In a digital representation the actual encoding (the sequence
of bytes) is not relevant to the definition.
]]

RI

============
Richard Ishida
Internationalization Lead
W3C (World Wide Web Consortium)

http://www.w3.org/International/
http://rishida.net/



> -----Original Message-----
> From: Martin Duerst [mailto:duerst@it.aoyama.ac.jp]
> Sent: 09 December 2008 02:43
> To: Richard Ishida; 'Mark Davis'
> Cc: public-i18n-core@w3.org
> Subject: RE: IRI
> 
> I think one good solution would be to do something similar to what's
> done for URIs. There, we have the following text:
> 
> >>>>
> Currently Web addresses are typically expressed using Uniform Resource
> Identifiers or URIs. The URI syntax defined in RFC 3986 STD 66 (Uniform
> Resource Identifier (URI): Generic Syntax) essentially restricts Web
addresses
> to a small number of characters: basically, just upper and lower case
letters
> of the English alphabet, European numerals and a small number of symbols.
> >>>>
> 
> At the start of "basic concepts", we could then say something similar,
> e.g. something along the lines of (borrowing some text from Mark):
> 
> >>>>
> To allow Web addresses to use characters from a wide range of scripts,
> you have to use Internationalized Resource Identifiers or IRIs.
> IRIs are defined in RFC 3987, and allow to use characters from the
> Universal Character Set (Unicode/ISO 10646); that lets them use Chinese
> characters, Russian (Cyrillic) characters, Arabic characters, and so on.
> For IRIs to work, there are four main requirements:
> >>>>
> 
> I have recently given this article to a student as part of the material
> to prepare for a talk about URIs and IRIs. He also had difficulties
> understanding, at each place in the article, what was being talked
> about, or why.
> 
> I think having the "four main requirements for IRIs to work" in the
> document is very good, but having it very early, and in a section
> entitled "basic concepts", is quite confusing. I would suggest moving
> that discussion a bit (or even quite a bit) farther down, and move
> some more of the really basic explanations higher up. I think the
> document currently tries to use the "four main requirements" as
> a started for explaining details such as punycode and %-encoding,
> but I think there are easier ways to introduce these.
> 
> Regards,    Martin.
> 
> At 01:22 08/12/09, Richard Ishida wrote:
> >Hmm.ツ  That's a definition I came to as a result of discussion with
Martin.ツ
> The definition in the IRI spec is " An IRI is a sequence of characters
from the
> Universal Character Set (Unicode/ISO 10646)."
> >
> >What did you have in mind (bearing in mind the audience of this document
> is " content authors, Web project managers, and general users who want to
> get a basic overview, without getting bogged down in gory technical
details,
> of what happens behind the scenes when they use non-ASCII characters in
> web addresses ")?
> >
> >Cheers,
> >RI
> >
> >============
> >Richard Ishida
> >Internationalization Lead
> >W3C (World Wide Web Consortium)
> >
> ><http://www.w3.org/International/>http://www.w3.org/International/
> >http://rishida.net/
> >
> >
> >From: mark.edward.davis@gmail.com
> [mailto:mark.edward.davis@gmail.com] On Behalf Of Mark Davis
> >Sent: 04 December 2008 06:28
> >To: Phillips, Addison
> >Cc: ishida@w3.org; Felix Sasaki; public-i18n-core@w3.org
> >Subject: Re: IRI
> >
> >I think I put it a bit too forcefully, but I find that the definitional
sentence:
> >
> >We will refer to Web addresses that allow the use of characters from a
wide
> range of scripts as Internationalized Resource Identifiers or IRIs
> >
> >
> >only gives a vague notion of what an IRI is. Then it plunges into what
> applications and protocols need to do to support it.
> >
> >Mark
> >
> >On Wed, Dec 3, 2008 at 21:37, Phillips, Addison
> <<mailto:addison@amazon.com>addison@amazon.com> wrote:
> >
> >Do you mean in the intended audience section? The first occurrence of IRI
in
> the article proper is just after the full spell-out. Still, the audience
section
> does use some undefined TLAs.
> >
> >
> >
> >Addison
> >
> >
> >
> >Addison Phillips
> >
> >Globalization Architect -- Lab126
> >
> >
> >
> >Internationalization is not a feature.
> >
> >It is an architecture.
> >
> >
> >
> >From: <mailto:public-i18n-core-request@w3.org>public-i18n-core-
> request@w3.org [mailto:public-i18n-core-request@w3.org] On Behalf Of
> Mark Davis
> >Sent: Wednesday, December 03, 2008 3:31 PM
> >To: <mailto:ishida@w3.org>ishida@w3.org; Felix Sasaki
> >Cc: <mailto:public-i18n-core@w3.org>public-i18n-core@w3.org
> >Subject: IRI
> >
> >
> >
> ><http://www.w3.org/International/articles/idn-and-
> iri/>http://www.w3.org/International/articles/idn-and-iri/
> >
> >
> >
> >I noticed that IRI is used before it is defined.
> >
> >
> >
> >Mark
> >
> 
> 
> #-#-#  Martin J. Du"rst, Assoc. Professor, Aoyama Gakuin University
> #-#-#  http://www.sw.it.aoyama.ac.jp       mailto:duerst@it.aoyama.ac.jp
Received on Thursday, 11 December 2008 17:54:57 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Thursday, 11 December 2008 17:54:59 GMT