Re: character encoding from Silvia Pfeiffer on 2010-02-22 (public-html-a11y@w3.org from February 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 23 Feb 2010 08:39:32 +1100
To: Philip Jägenstedt <philipj@opera.com>
Cc: Geoff Freed <geoff_freed@wgbh.org>, HTML Accessibility Task Force <public-html-a11y@w3.org>, Dick Bulterman <Dick.Bulterman@cwi.nl>
Message-ID: <2c0e02831002221339j456b7eefr7111030579bae294@mail.gmail.com>

On Tue, Feb 23, 2010 at 3:45 AM, Philip Jägenstedt <philipj@opera.com> wrote:
> On Mon, 22 Feb 2010 23:04:31 +0800, Geoff Freed <geoff_freed@wgbh.org>
> wrote:
>
>>
>> Regarding our SRT discussion, here's a new question.  How does SRT handle
>> different character encodings-- Windows, UTF-8, UTF-16, Shift-JIS, Big5,
>> etc.-- without using byte order marks (BOM), which are *not* required by
>> unicode?  In other words, how is a UA going to intuit character encodings if
>> they are not explicitly declared (which they *would* be in an XML document)?
>
> I think the only obvious point is that HTTP headers should be authoritative,
> so if they are present they should simply be obeyed. If necessary we might
> also have an attribute on <track> that gives the encoding, current thinking
> is in type, e.g. type="text/srt; charset=UTF-8". I do not think we should
> have any sniffing whatsoever if neither HTTP headers or markup give a
> character encoding but instead assume UTF-8.
>
> This topic hasn't been discussed in very much depth before, so I assume not
> everyone agrees.

I am with you on this. In my experience sniffing on srt files failed
about 10-15% of the files that I used for testing, so I'd also prefer
to have it explicit in the markup or in the HTTP headers.

Cheers,
Silvia.

Received on Monday, 22 February 2010 21:40:24 UTC