Re: timing model of the media resource in HTML5 from Philip Jägenstedt on 2010-02-02 (public-html-a11y@w3.org from February 2010)

From: Philip Jägenstedt <philipj@opera.com>
Date: Tue, 02 Feb 2010 09:19:35 +0100
To: "Silvia Pfeiffer" <silviapfeiffer1@gmail.com>, "Eric Carlson" <eric.carlson@apple.com>
Cc: "HTML Accessibility Task Force" <public-html-a11y@w3.org>, "Ken Harrenstien" <klh@google.com>
Message-ID: <op.u7hqqpeksr6mfa@worf>

On Mon, 01 Feb 2010 23:30:19 +0100, Silvia Pfeiffer  
<silviapfeiffer1@gmail.com> wrote:

> On Tue, Feb 2, 2010 at 3:59 AM, Eric Carlson <eric.carlson@apple.com>  
> wrote:
>> On Feb 1, 2010, at 4:19 AM, Silvia Pfeiffer wrote:
>>
>> On Fri, Jan 29, 2010 at 12:39 AM, Philip Jägenstedt <philipj@opera.com>
>> wrote:
>>
>> On Wed, 27 Jan 2010 12:57:51 +0100, Silvia Pfeiffer
>>
>> If we buried the track information in a javascript API, we would
>>
>> introduce an additional dependency and we would remove the ability to
>>
>> simply parse the Web page to get at such information. For example, a
>>
>> crawler would not be able to find out that there is a resource with
>>
>> captions and would probably not bother requesting the resource for its
>>
>> captions (or other text tracks).
>>
>> Surely, robots would just index the resources themselves?
>>
>> Why download binary data of indeterminate length when you can already
>> get it out of the text of the Web page? Surely, robots would prefer to
>> get that information directly out of the Webpage and not have to go
>> and download gazillions of binary media files that they have to decode
>> to get information about them.
>>
>> Right now, everybody who sees a video element in a HTML5 page simply
>> assumes that it consists of a video and a audio track and has no other
>> information in it. This is fine in the default case and in the default
>> case no extra resource description is probably necessary. But when we
>> actually do have a richer source, we need to expose that.
>>
>>   This argument leads down a very slippery slope. If it is crucial to
>> include caption information in markup for spiders, what about other  
>> media
>> file metadata that a crawler might want to read - intrinsic width and
>> height, duration, encoding format, file size, bit rate, frame rate, etc,
>> etc, etc? Robots may prefer to have all of this in the page do they  
>> don't
>> have to load and parse the file, but I don't think it is necessary or
>> appropriate.
>
> Not quite.
>
> It is a difference if you are a web crawler that wants to collect
> captions or one that wants to collect such file metadata. For file
> metadata, you are bound to always be successful when parsing the
> header of a binary file. So, I agree there with you.
>
> But if you are only keen on captions, you are bound to often parse
> useless information if you have to download the media file header. A
> hint inside the markup that there are captions/subtitles there and
> that it is useful to parse the file - and then parse it fully - is
> very relevant.

Even if all browser vendors should agree that this is useful and  
implemented the suggested track markup, it will only be used by authors in  
very rare situations -- when they want to populate the browser's context  
menu before HAVE_METADATA. As most videos that have multiple  
audio/video/text tracks won't be marked up as such in HTML, robots will  
still have to download the headers of all videos to see if they have  
captions. If they want to index the captions (not just the fact that they  
exist), they'll also have to download the whole file.

-- 
Philip Jägenstedt
Core Developer
Opera Software

Received on Tuesday, 2 February 2010 08:20:50 UTC