W3C home > Mailing lists > Public > www-international@w3.org > July to September 2002

Re: Determining if Unicode / appropriate glyphs installed on client

From: Chris Croome <chris@webarchitects.co.uk>
Date: Tue, 9 Jul 2002 14:25:52 +0100
To: www-international@w3.org
Message-ID: <20020709132552.GK23449@webarchitects.co.uk>

Hi

On Tue 09-Jul-2002 at 02:05:10 +0200, Chris Lilley wrote:
> 
> For the Gujarati page
> http://www.laptopchallenge.org.uk/gu/
> 
> some of the problems might be that the XHTML page is not well formed.

Oops, thanks for spotting that, it should be fixed now.

> I hypothesised that this would trigger IE to send it to the
> traditional HTML parser, which fails to realise that it is UTF-8 and
> displays it as Latin-1.

Really, I didn't realise that IE ignored the charset HTTP headers for
invalid XHTML documents :-( 

> On the other hand the Punjabi page is well formed and valid
> http://www.laptopchallenge.org.uk/pa/
> 
> and IE6 on WinXP still thinks it is Latin-1. Probably a meta element
> with a charset would tell IE what to use.
>
> Aha! Not serving the pages as latin-1 would also help:
> 
> [clilley@tux]$ telnet www.laptopchallenge.org.uk 80
> Trying 195.10.230.121...
> Connected to www.laptopchallenge.org.uk.
> Escape character is '^]'.
> HEAD /pa/ HTTP/1.0
> 
> HTTP/1.1 302 Found
> Date: Tue, 09 Jul 2002 12:02:58 GMT
> Server: Apache/1.3.26 (Unix) mod_perl/1.27 mod_gzip/1.3.19.1a
> Location: http://webarch.net/pa/
> Connection: close
> Content-Type: text/html; charset=iso-8859-1
> 
> Connection closed by foreign host.
> 
> clilley@tux clilley]$ telnet webarch.net 80
> Trying 195.10.230.121...
> Connected to webarch.net.
> Escape character is '^]'.
> HEAD /pa/ HTTP/1.0
> 
> HTTP/1.1 302 Found
> Date: Tue, 09 Jul 2002 12:04:59 GMT
> Server: Apache/1.3.26 (Unix) mod_perl/1.27 mod_gzip/1.3.19.1a
> Location: http://webarch.net/pa/
> Connection: close
> Content-Type: text/html; charset=iso-8859-1
> 
> Connection closed by foreign host.

No, the page _is_ served as UTF-8 the problem you had above (I _think_)
is that you tried with HTTP 1.0 not 1.1 -- there is not 1 IP address per
domain name on that web server, the iso-8859-1 page is a Apache
generated 302 document.

Try with lynx:

[chris@snowball chris]$ lynx -head -dump http://www.laptopchallenge.org.uk/pa/
HTTP/1.1 200 OK
Date: Tue, 09 Jul 2002 13:19:04 GMT
Server: Apache/1.3.26 (Unix) mod_perl/1.27 mod_gzip/1.3.19.1a
Content-Language: pa
Last-Modified: Tue, 09 Jul 2002 13:19:04 GMT
Content-Length: 16188
Connection: close
Content-Type: text/html; charset=UTF-8

Actually you get the _whole_ document rather than just the HEAD but this
is due to a problem with mod_perl's Apache::Registery handler.

Or try with Apache benchmark:

[chris@snowball chris]$ /usr/local/apache/bin/ab -v 4 http://www.laptopchallenge.org.uk/pa/
This is ApacheBench, Version 1.3d <$Revision: 1.59 $> apache-1.3
Copyright (c) 1996 Adam Twiss, Zeus Technology Ltd,
http://www.zeustech.net/
Copyright (c) 1998-2002 The Apache Software Foundation,
http://www.apache.org/

Benchmarking www.laptopchallenge.org.uk (be patient)...INFO: POST header == 
---
GET /pa/ HTTP/1.0
User-Agent: ApacheBench/1.3d
Host: www.laptopchallenge.org.uk
Accept: */*


---
LOG: header received:
HTTP/1.1 200 OK
Date: Tue, 09 Jul 2002 13:21:44 GMT
Server: Apache/1.3.26 (Unix) mod_perl/1.27 mod_gzip/1.3.19.1a
Content-Language: pa
Last-Modified: Tue, 09 Jul 2002 13:19:04 GMT
Content-Length: 16188
Connection: close
Content-Type: text/html; charset=UTF-8

Or telnet with HTTP 1.1:

[chris@snowball chris]$ telnet www.laptopchallenge.org.uk 80
Trying 195.10.230.121...
Connected to www.laptopchallenge.org.uk.
Escape character is '^]'.
GET /pa/ HTTP/1.1
Host: www.laptopchallenge.org.uk

HTTP/1.1 200 OK
Date: Tue, 09 Jul 2002 13:24:03 GMT
Server: Apache/1.3.26 (Unix) mod_perl/1.27 mod_gzip/1.3.19.1a
Content-Language: pa
Last-Modified: Tue, 09 Jul 2002 13:19:04 GMT
Content-Length: 16188
Content-Type: text/html; charset=UTF-8

Chris

-- 
Chris Croome                               <chris@webarchitects.co.uk>
web design                             http://www.webarchitects.co.uk/ 
web content management                               http://mkdoc.com/   
everything else                               http://chris.croome.net/  
Received on Tuesday, 9 July 2002 09:25:54 GMT

This archive was generated by hypermail 2.2.0+W3C-0.50 : Tuesday, 2 June 2009 19:16:59 GMT