- From: Keld J|rn Simonsen <keld@dkuug.dk>
- Date: Wed, 31 Jan 1996 12:15:39 +0100
- To: Larry Masinter <masinter@parc.xerox.com>, borka@e5.ijs.si
- Cc: yergeau@alis.ca, Dan.Oscarsson@malmo.trab.se, maits@dkuug.dk, uri@bunyip.com
Larry Masinter writes: > > What Keld said is sound and could be worked further. THe major > > restriction is the DNS part and this should be kept as it is > > (character < 127). The same applies to the syntax characters. > > No, "what Keld said" isn't "sound" it is just "sounds nice". Glad you like the sound effects, Larry! > Keld said, for example, > > > 1. URLs themselves. > > > These are at an abstract character level, as Larry and Franc,ois > > correctly points out, you cannot see what is the charset > > when you look at a business card or an URL in the newspaper. > > > I propose that any character here be allowed, except for the > > URL syntax characters, (things like < / : ) - in the non-DNS > > part of the URL. Remember these are abstract characters, and > > there is no binding to for example ISO 10646 in the sense > > of a character repertoire, or to any encoding (charset). > > However, this nice-sounding proposal contained no solution to the > following questions: > > 1)how do these abstract characters subsequently get turned > into octets that are employed in real protocols in general > and http and ftp in particular? > (The current URL specification gives an algorithm.) >From glyphs on paper to a computer system, eg. a browser: by having the human recognise (aka "read") the characters and enter them, as is normally done. >From a html doc into a http request: The html doc has a charset, and the http request url is represented in a charset. So the html string with the URL is converted into the http charset, and then the URL is sent with high bits encoded according to the url specifications (in %xx notation). I found no ways of specifying a charset in the current rfcs on URLs. I did specify the transformations and encodings in earlier mail. > > 2)how does one translate a URL that uses a large character > repertoire so that it might be written in a context with > a small repertoire? E.g., a URL with chinese characters > in an ASCII email message. > (The current URL specification manages this by limiting > the repertoire.) That was also described in the previous mailing, about the html I said: > >Here it should be possible to write a HTML document in a given > >charset, and then reference the (abstract) characters in the URL, just > >like it is possible to write characters in the rest of the HTML document. > >That is, the normal characters of the document charset can be used, > >like full iso-8859-1 in normal HTML docs, and full Unicode in > >Unicode docs. Also the way of generating out-of-band characters > >should be allowed in HTML URL strings, like &a-ring and &#xxxx; > I don't think these problems are unsolvable, but I think in the course > of making a "sound" proposal you'll find that it starts "sounding" > less and less like something that you'd want to implement. I think most of the concerns have been addressed in what I wrote, but anyway there may be finer details in it that needs to be sharpened and and it needs to be cast in concrete specs. I think most of the specs are already there and ready to be employed in an implementation. > So, I'll ask again, PLEASE stop cross-posting this discussion to three > separate mailing lists. OK, taken ad notam. Keld
Received on Wednesday, 31 January 1996 06:18:20 UTC