RE: timing model of the media resource in HTML5 from Frank Olivier on 2010-02-01 (public-html-a11y@w3.org from February 2010)

From: Frank Olivier <franko@microsoft.com>
Date: Mon, 1 Feb 2010 16:49:08 +0000
To: Silvia Pfeiffer <silviapfeiffer1@gmail.com>, Philip Jägenstedt <philipj@opera.com>
CC: Eric Carlson <eric.carlson@apple.com>, HTML Accessibility Task Force <public-html-a11y@w3.org>, Ken Harrenstien <klh@google.com>
Message-ID: <DBB7A800D05F0C44A5EBA0231C567C392076D509@TK5EX14MBXW652.wingroup.windeploy.ntde>

>>> If we buried the track information in a javascript API, we would
>>> introduce an additional dependency and we would remove the ability to
>>> simply parse the Web page to get at such information. For example, a
>>> crawler would not be able to find out that there is a resource with
>>> captions and would probably not bother requesting the resource for its
>>> captions (or other text tracks).
>>
>> Surely, robots would just index the resources themselves?

>Why download binary data of indeterminate length when you can already
>get it out of the text of the Web page? Surely, robots would prefer to
>get that information directly out of the Webpage and not have to go
>and download gazillions of binary media files that they have to decode
>to get information about them.

>Right now, everybody who sees a video element in a HTML5 page simply
>assumes that it consists of a video and a audio track and has no other
>information in it. This is fine in the default case and in the default
>case no extra resource description is probably necessary. But when we
>actually do have a richer source, we need to expose that.

But robots will not be downloading binary data of indeterminate length - they can download and parse a small header (in the vast majority of cases) using HTTP 1.1. Images and PDFs are indexed in this way already.

Received on Monday, 1 February 2010 16:49:42 UTC