Re: html, http, urls and internationalisation

Larry Masinter (masinter@parc.xerox.com)
Wed, 31 Jan 1996 00:43:53 PST


Message-Id: <199601310811.AA22635@pehtra.e5.ijs.si>
To: keld@dkuug.dk (Keld J|rn Simonsen)
Cc: Larry Masinter <masinter@parc.xerox.com>, yergeau@alis.ca,
Subject: Re: html, http, urls and internationalisation 
In-Reply-To: Your message of "Tue, 30 Jan 1996 23:14:43 +0100."
             <199601302214.XAA23056@dkuug.dk> 
Date: Wed, 31 Jan 1996 09:11:03 +0100
From: Borka Jerman-Blazic <borka@e5.ijs.si>


What Keld said is sound and could be worked further. THe major restriction
is the DNS part and this should be kept as it is (character < 127). The same
applies to the syntax characters.


Borka
 
To: borka@e5.ijs.si
Cc: keld@dkuug.dk, yergeau@alis.ca, Dan.Oscarsson@malmo.trab.se,
In-Reply-To: Borka Jerman-Blazic's message of Wed, 31 Jan 1996 00:11:03 -0800 <199601310811.AA22635@pehtra.e5.ijs.si>
Subject: Re: html, http, urls and internationalisation 
From: Larry Masinter <masinter@parc.xerox.com>
Message-Id: <96Jan31.004354pst.2733@golden.parc.xerox.com>
Date: Wed, 31 Jan 1996 00:43:53 PST

> What Keld said is sound and could be worked further. THe major
> restriction is the DNS part and this should be kept as it is
> (character < 127). The same applies to the syntax characters.

No, "what Keld said" isn't "sound" it is just "sounds nice".

Keld said, for example,

> 1. URLs themselves.

> These are at an abstract character level, as Larry and Franc,ois
> correctly points out, you cannot see what is the charset
> when you look at a business card or an URL in the newspaper.

> I propose that any character here be allowed, except for the 
> URL syntax characters, (things like < / : ) - in the non-DNS
> part of the URL. Remember these are abstract characters, and
> there is no binding to for example ISO 10646 in the sense
> of a character repertoire, or to any encoding (charset).

However, this nice-sounding proposal contained no solution to the
following questions:

1)how do these abstract characters subsequently get turned
  into octets that are employed in real protocols in general
  and http and ftp in particular?
  (The current URL specification gives an algorithm.)

2)how does one translate a URL that uses a large character
  repertoire so that it might be written in a context with 
  a small repertoire? E.g., a URL with chinese characters
  in an ASCII email message.
  (The current URL specification manages this by limiting
  the repertoire.)

I don't think these problems are unsolvable, but I think in the course
of making a "sound" proposal you'll find that it starts "sounding"
less and less like something that you'd want to implement.

So, I'll ask again, PLEASE stop cross-posting this discussion to three
separate mailing lists.