RE: timing model of the media resource in HTML5

>>> If we buried the track information in a javascript API, we would
>>> introduce an additional dependency and we would remove the ability to
>>> simply parse the Web page to get at such information. For example, a
>>> crawler would not be able to find out that there is a resource with
>>> captions and would probably not bother requesting the resource for its
>>> captions (or other text tracks).
>>
>> Surely, robots would just index the resources themselves?

>Why download binary data of indeterminate length when you can already
>get it out of the text of the Web page? Surely, robots would prefer to
>get that information directly out of the Webpage and not have to go
>and download gazillions of binary media files that they have to decode
>to get information about them.

>Right now, everybody who sees a video element in a HTML5 page simply
>assumes that it consists of a video and a audio track and has no other
>information in it. This is fine in the default case and in the default
>case no extra resource description is probably necessary. But when we
>actually do have a richer source, we need to expose that.

But robots will not be downloading binary data of indeterminate length - they can download and parse a small header (in the vast majority of cases) using HTTP 1.1. Images and PDFs are indexed in this way already.

Received on Monday, 1 February 2010 16:49:42 UTC