Re: character encoding from Philip Jägenstedt on 2010-02-22 (public-html-a11y@w3.org from February 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 23 Feb 2010 00:45:34 +0800
To: "Geoff Freed" <geoff_freed@wgbh.org>, "HTML Accessibility Task Force" <public-html-a11y@w3.org>
Cc: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, "Dick Bulterman" <Dick.Bulterman@cwi.nl>
Message-ID: <op.u8jfh8exatwj1d@philip-pc.oslo.opera.com>

On Mon, 22 Feb 2010 23:04:31 +0800, Geoff Freed <geoff_freed@wgbh.org>  
wrote:

>
> Regarding our SRT discussion, here's a new question.  How does SRT  
> handle different character encodings-- Windows, UTF-8, UTF-16,  
> Shift-JIS, Big5, etc.-- without using byte order marks (BOM), which are  
> *not* required by unicode?  In other words, how is a UA going to intuit  
> character encodings if they are not explicitly declared (which they  
> *would* be in an XML document)?

I think the only obvious point is that HTTP headers should be  
authoritative, so if they are present they should simply be obeyed. If  
necessary we might also have an attribute on <track> that gives the  
encoding, current thinking is in type, e.g. type="text/srt;  
charset=UTF-8". I do not think we should have any sniffing whatsoever if  
neither HTTP headers or markup give a character encoding but instead  
assume UTF-8.

This topic hasn't been discussed in very much depth before, so I assume  
not everyone agrees.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Monday, 22 February 2010 16:46:22 UTC