[whatwg] Default encoding to UTF-8?

On 12/5/11 9:55 PM, Leif Halvard Silli wrote:
> If that is all they tested, then I'd said they did not test enough.

That's normal for the web.

>> (For the record, reading a particular page in a language is a much
>> simpler task than reading the language; I can't "read German", but I can
>> certainly read a German subway map.)
>
> Or Polish subway map - which doesn't default to said encoding.

Indeed.  I don't think anyone thinks the existing situation is all fine 
or anything.

> I said I agreed with him that Faruk's solution was not good. However, I
> would not be against treating<DOCTYPE html>  as a 'default to UTF-8'
> declaration

This might work, if there hasn't been too much cargo-culting yet.  Data 
urgently needed!

>> Not unless we change the authoring tools.  Half the time these things
>> are just directly exported from a word processor.
>
> Please educate me. I'm perhaps 'handicapped' in that regard: I haven't
> used MS Word on a regular basis since MS Word 5.1 for Mac. Also, if
> "export" means "copy and paste"

It can mean that, or "save as HTML" followed by copy and paste.

> then on the Mac, everything gets
> converted via the clipboard

On Mac, the default OS encoding is UTF-8 last I checked.  That's 
decidedly not the case on Windows.

>>> OK: Quotation marks. However, in 'old web pages', then you also find
>>> much more use of HTML entities (such as?) than you find today.
>>> We should take advantage of that, no?
>>
>> I have no idea what you're trying to say,
>
> Sorry. What I meant was that character entities are encoding
> independent.

Yes.

> And that lots of people - and authoring tools - have
> inserted non-ASCII letters and characters as character entities,

Sure.  And lots have inserted them "directly".

> At any rate: A page which uses
> character entities for non-ascii would render the same regardless of
> encoding, hence a switch to UTF-8 would not matter for those.

Sure.  We're not worried about such pages here.

-Boris

Received on Monday, 5 December 2011 19:18:10 UTC