Re: character encoding

On Tue, Feb 23, 2010 at 3:45 AM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Mon, 22 Feb 2010 23:04:31 +0800, Geoff Freed <geoff_freed@wgbh.org>
> wrote:
>
>>
>> Regarding our SRT discussion, here's a new question.  How does SRT handle
>> different character encodings-- Windows, UTF-8, UTF-16, Shift-JIS, Big5,
>> etc.-- without using byte order marks (BOM), which are *not* required by
>> unicode?  In other words, how is a UA going to intuit character encodings if
>> they are not explicitly declared (which they *would* be in an XML document)?
>
> I think the only obvious point is that HTTP headers should be authoritative,
> so if they are present they should simply be obeyed. If necessary we might
> also have an attribute on <track> that gives the encoding, current thinking
> is in type, e.g. type="text/srt; charset=UTF-8". I do not think we should
> have any sniffing whatsoever if neither HTTP headers or markup give a
> character encoding but instead assume UTF-8.
>
> This topic hasn't been discussed in very much depth before, so I assume not
> everyone agrees.

I am with you on this. In my experience sniffing on srt files failed
about 10-15% of the files that I used for testing, so I'd also prefer
to have it explicit in the markup or in the HTTP headers.

Cheers,
Silvia.

Received on Monday, 22 February 2010 21:40:24 UTC