Re: text encoding

At 22:26 98/07/15 +0000, Clive Bruton wrote:
> I'm trying to discover the ins and outs of various encoding schemes 
> identified in HTML thus:
> 
>      <META HTTP-EQUIV="Content-Type"  Content="text/html; 
> charset=ISO-nnnn">

Yes. Even better, make the server send out that information in
the HTTP header.


> My main reason for this is to avoid encoding text in it's source form, 
> and let the browser decode it as necessary. Which pretty much seems to be 
> the point of this META tag.
> 
> So, rather than encode specific characters "&#nnn;" you just declare 
> their original encoding. Brilliant!

Exactly.


> How then is it that you declare Mac Roman encoding to work on other 
> platforms?
> 
>      <META HTTP-EQUIV="Content-Type"  Content="text/html; 
> charset=x-mac-roman">
> 
> As far as I can tell the above only works on a Mac, which seems rather 
> pointless.

No. Ideally, we would only have one encoding. It is much easier
to convert at a source, where you are sure that you know the
encoding, than at the target, where you are not guaranteed to
know the encoding.

The web started out with the (brilliant) idea to have just one encoding.
That was iso-8859-1 (Latin-1). Unfortunately, that covered only a limited
part of the world. The above mechanisms were added later not to have many
different encodings for the same set of characters (i.e. those used in
Western Europe,...), but to allow other regions to use their own encodings
for their own characters (which they started to do anyway) and to tell
others about it (so that things appear correct on the screen directly).

Web browsers from the start converted iso-8859-1 to MacRoman on the Mac,
even e.g. in not so obvious places such as when cutting/pasting from the
browser to another application via the clipboard.

So for the web, please use as few encodings as possible. In the future,
this may be only UTF-8 or so, but for the moment, there are a few more.


Regards,   Martin.

Received on Wednesday, 15 July 1998 22:49:28 UTC