Re: timing model of the media resource in HTML5 from Silvia Pfeiffer on 2010-02-01 (public-html-a11y@w3.org from February 2010)

From: Silvia Pfeiffer <silviapfeiffer1@gmail.com>
Date: Tue, 2 Feb 2010 09:30:19 +1100
To: Eric Carlson <eric.carlson@apple.com>
Cc: Philip Jägenstedt <philipj@opera.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>, Ken Harrenstien <klh@google.com>
Message-ID: <2c0e02831002011430o2bdb9190n84c3f24f56f0af0a@mail.gmail.com>

On Tue, Feb 2, 2010 at 3:59 AM, Eric Carlson <eric.carlson@apple.com> wrote:
> On Feb 1, 2010, at 4:19 AM, Silvia Pfeiffer wrote:
>
> On Fri, Jan 29, 2010 at 12:39 AM, Philip Jägenstedt <philipj@opera.com>
> wrote:
>
> On Wed, 27 Jan 2010 12:57:51 +0100, Silvia Pfeiffer
>
> If we buried the track information in a javascript API, we would
>
> introduce an additional dependency and we would remove the ability to
>
> simply parse the Web page to get at such information. For example, a
>
> crawler would not be able to find out that there is a resource with
>
> captions and would probably not bother requesting the resource for its
>
> captions (or other text tracks).
>
> Surely, robots would just index the resources themselves?
>
> Why download binary data of indeterminate length when you can already
> get it out of the text of the Web page? Surely, robots would prefer to
> get that information directly out of the Webpage and not have to go
> and download gazillions of binary media files that they have to decode
> to get information about them.
>
> Right now, everybody who sees a video element in a HTML5 page simply
> assumes that it consists of a video and a audio track and has no other
> information in it. This is fine in the default case and in the default
> case no extra resource description is probably necessary. But when we
> actually do have a richer source, we need to expose that.
>
>   This argument leads down a very slippery slope. If it is crucial to
> include caption information in markup for spiders, what about other media
> file metadata that a crawler might want to read - intrinsic width and
> height, duration, encoding format, file size, bit rate, frame rate, etc,
> etc, etc? Robots may prefer to have all of this in the page do they don't
> have to load and parse the file, but I don't think it is necessary or
> appropriate.

Not quite.

It is a difference if you are a web crawler that wants to collect
captions or one that wants to collect such file metadata. For file
metadata, you are bound to always be successful when parsing the
header of a binary file. So, I agree there with you.

But if you are only keen on captions, you are bound to often parse
useless information if you have to download the media file header. A
hint inside the markup that there are captions/subtitles there and
that it is useful to parse the file - and then parse it fully - is
very relevant.

Cheers,
Silvia.

Received on Monday, 1 February 2010 22:31:11 UTC