- From: Chris Lilley <chris@w3.org>
- Date: Wed, 17 Jul 1996 12:52:06 +0200
- To: Stephanos Piperoglou <stephanos@hol.gr>
- CC: www-html@w3.org
Stephanos Piperoglou wrote: > Now NORMALLY, and sticking to standards, I can't even write HTML in Greek. I agree that this has been a problem in the past, because HTML 1 standardised on the Latin-1 (ISO 8859-1) character set. (But hey this was an improvement on ASCII). But even with HTML 2.0 a clear direction towards Internationalisation was shown by making the document character set Unicode (restricted in that version to the first 256 code positions, ie the same as 8859-1 in practice). The IETF HTML WG Internationalisation draft is in last call which removes this restriction; the full basic multilingual plane of Unicode is available. Since HTML makes a distinction between the document character set (ie the logical computational space for character manipulation, such as resolving numeric entity references) and the character encoding used to transmit the document (as indicated by the charset parameter) you can create Greek html pages which are correctly labelled and conform to specifications. Just send them out with the MIME type: Content-Type: text/html; charset=iso-9959-7 For an example of this, see: http://www.alis.com:8085/demo/grec/ntua.html Which is correctly labelled, look: bash$ telnet www.alis.com 8085 Trying 207.81.28.7... Connected to www.alis.com. Escape character is '^]'. HEAD /demo/grec/ntua.html HTTP/1.0 HTTP/1.0 200 Le document suit Date: Wed, 17 Jul 1996 10:36:07 GMT Server: NCSA/1.4 Content-type: text/html; charset=iso-8859-7 Content-Language: el Last-modified: Mon, 30 Nov 1987 01:19:08 GMT Content-length: 6316 Connection closed by foreign host. > Official Athens College site: http://www.gsc.net/hosted/athens_college/ Compare this with your server: bash$ telnet www.gsc.net 80 Trying 204.57.142.57... Connected to www.gsc.net. Escape character is '^]'. GET /hosted/athens_college/ HTTP/1.0 HTTP/1.0 200 OK Server: Netscape-Communications/1.1 Date: Wednesday, 17-Jul-96 10:42:20 GMT Last-modified: Tuesday, 25-Jun-96 05:42:00 GMT Content-length: 1787 Content-type: text/html <== oops! <HEAD> [ ... stuff omitted ...] <P>Welcome to Athens College! These pages have been set up by the Athens College Computer Society in order to bring our fine institution to the Internet, but also to bring the Internet to the College. These pages are unfortunately not yet fully bilingual, so please click on the gate to proceed. If you can't display Greek characters with your browser, you might want to have a look <A HREF="http://users.hol.gr/~stephanos/greek.html">here</A>.</P> <TD WIDTH=50% VALIGN=top> <P>Σας καλωσορίζουμε στο Κολλέγιο Αθηνών! Οι σελίδες αυτές έχουν εγκατασταθεί από τον Όμιλο Υπολογιστών του Κολλεγίου Αθηνών για να φέρουν το ίδρυμα αυτό στο Internet, αλλά και για να φέρουν το Internet στο Κολλέγιο. Οι σελίδες δεν έχουν μεταφρασθεί ακόμα σε δύο γλώσσες, οπότε παρακαλούμε πατήστε στην πύλη για να συνεχίσετε:</P> For information on the different 8859 character sets, see: http://www.cs.tu-berlin.de/~czyborra/charsets/ Further details of I18N work: http://www.w3.org/pub/WWW/International/ http://www.alis.com:8085/ietf/html/ > However if you have the coreect font installed on your browser Urgh. A Font is an ordered collection of glyphs, the order being given by the font's encoding vector. A character set is an ordered collection of characters. Please do not create HTML pages containing garbage characters by assuming a one-to-one ordered mapping between the two. On the other hand if you lablel your HTML then the appropriate character set and font can be autonmatically selected by compliant browsers, even if the font encoding vector does not match the character encoding used to transmit the document. Fonts are one component of an I18N solution, not the whole answer. > Netscape 3.0b4 and later > supportr iso-8859-7 (greek) character sets (so hurrah, though I have no > idea how to make my pages recognizable as Greek by Netscape... unless > every user does Options > Document Encoding > Greek one he meets my page). Send your documents out labelled as I said above, then Netscape (and other browsers) will recognise them and switch character sets and fonts for you automatically. > Even the newest versions of Netscape under Windows 95 won't let you enter > non-english characters in forms! Yes, adding an accept-charset attribute to form input fields was another thing that the Internationalisation draft did. Then you can create a form that accepts Greek, you type in Greek, it gets sent to the server CGI script correctly labelled. Have a look at the tango browser, which implements the I18N specification. See: http://www.alis.com/ I have no connection with the Alis company, just pleased to see another step towards a World Wide Web. -- Chris Lilley, W3C [ http://www.w3.org/ ] http://www.w3.org/people/chris/ INRIA/W3C chris@w3.org 2004 Rt des Lucioles / BP 93 +33 93 65 79 87 06902 Sophia Antipolis Cedex, France
Received on Wednesday, 17 July 1996 06:55:32 UTC