Re: Guessing "correct" character set (was: [CSS21] response to issue 115 (and 44)) from David Woolley on 2004-02-21 (www-style@w3.org from February 2004)

From: David Woolley <david@djwhome.demon.co.uk>
Date: Sat, 21 Feb 2004 16:41:16 +0000 (GMT)
To: www-style@w3.org
Message-Id: <200402211641.i1LGfHk10306@djwhome.demon.co.uk>

> the content correctly. This should be the default behavior, some 
> user agents may allow opt-in to automagic guess mechanism which may 
> or may not work.

Only an entrenched monopoly supplier or ones catering to a certain sort of
technically aware market can afford to take this parternalistic approach.
Anyone else (operating in a broad and free market) might provide this as
a token warning, but the opt-out of the warning will be very prominent,
typically a checkbox early in the pop-up, and the expectation will be
that normal users will tick this on first encountering the message,
and without reading any of the detail.

If, like cookies and scripting, there are benefits to commercial authors
(probably no real valid case here) the warning will be something to
which you have to opt in.  Look at the messages you get from commercial
sites when you have cookies or scripting off.  They don't give you the
information to make an informed decision, they just give cookbook
procedures to reduce the browser security.

> UTF-8 can represent every character anybody needs so that doesn't 

Tell that to the Klingons!  Even with major natural human languages,
it is going to take a long time to get UTF-8 into areas that weren't
traditionally good markets for Western software suppliers.  It looks to
me that the whole of India has established the use of misrepresented
glyph based character sets as the norm[1].  Even looking at Hindi, the
only exceptions to the rule seem to be the BBC and Google.  UTF-8 is not
going to come to that market until Windows XP gets most of that market
or there are free, Unicode encoded, fonts available for earlier systems
and non-Microsoft systems.  Even free fonts will need a lot of good
will from the web sites, as they have probably got their current hacks
working well, and people have downloaded the glyph fonts.  They are
going to have to operate parallel sites and provide some incentive
to move to the UTF-8 one.  (Even XP is missing significant Indian
languages.)

> cause problems to you as a document author in case you cannot fix 
> the HTTP header. Just transcode from your current character set to 
> UTF-8. Shouldn't be a problem while authoring NEW documents.

I might agree if you said to new standards, but I think we are talking
about people authoring to de facto CSS2 implementations being viewed
in CSS3 capable browsers.  I think the only thing you could really do
it is to defer the warning until a CSS3 feature is invoked, but then
you've already done the hard work!

> can make the document author feel that he gets all the blame, the 
> faster he'll fix the document. If he doesn't care, it might be that 

The past experience in this area is that the browser developer gets
the blame for faulting documents that are "clearly valid" because they
work "perfectly" in the competing product.

[1] This may not be true in other Indian data processing areas, but,
even in the West, most people see HTML, etc., as a final form, display
description, medium, so specifying precise selections of ligatures and
strict left to right rendering probably aren't seen as a problem for
the web.  (As was discussed some time ago, the result is that the main
users of @font-face are actually abusing it!)

Received on Saturday, 21 February 2004 11:41:20 UTC