RE: Web and TV Accessibility from GAUSMAN, PAUL on 2011-09-16 (public-web-and-tv@w3.org from September 2011)

From: GAUSMAN, PAUL <pg2483@att.com>
Date: Fri, 16 Sep 2011 20:53:16 +0000
To: Thomas Stockhammer <stockhammer@nomor.de>
CC: Robert Pearson <robert.pearson@ami.ca>, Silvia Pfeiffer <silviapfeiffer1@gmail.com>, "public-web-and-tv@w3.org" <public-web-and-tv@w3.org>
Message-ID: <F403326A8484704DAAAD9BA5687DDE87A29C95@MISOUT7MSGUSR9I.ITServices.sbc.com>
I need to read up on DASH to understand what it provides. Thanks for the response.

Here are a few examples/scenarios to illustrate what I'm thinking about:

1.      [multiple devices, multiple viewing angles, interactive media] I'm on a voice call with my spouse. I have a tablet device, also, and my spouse has a TV, all of which are connected to and registered with "the network". She tells me about a commercial she sees on TV for a new car. The commercial appears on my tablet. Our comments steer the path of the commercial: "I wonder how it would look in blue?" The cameras on our larger screen devices put our faces on the screen with the commercial, so we can see each other's expressions. (Lip-sync desirable.) We each inspect the car separately but the image of the car reacts to both of our gestures on both screens, from each of our perspectives (I'm looking under the hood; my wife is looking at the interior of the cabin.) "How do you like the upholstery?" I ask. Etc.

2.      [virtual reality living room, synthetic sporting event, viewed from multiple locations] Steve, Bill and I are in 3 separate cities but play a dream league football game together in real-time. Bill's dream team is playing Steve's dream team, and I'll the referee. All three of us are represented with avatars in the virtual football stadium environment.  Each of us can manipulate our respective virtual components and it all comes together in sync in the VR space. Our individual views are POV correct for our position on the field.

3.      [personalized augmented experience, natural interaction] I'm in a physical showroom in the Mall. I pick up a tablet device at the front door of the store. I call a number on my smart phone to engage the personal sales agent and register my phone with the sales system. I am connected to a live salesman in a remote location who is synchronized to realistic avatar rendering of an idealized sales person, selected for me based on my personal profile data. As I walk through the store, the salesman's avatar is on the tablet device. I can hear the salesman's voice using my BT headset on my phone or using a headset connected to the tablet, provided by the store. The salesman's presentation is appropriate for the display I'm looking at. He is prompted by an intelligent agent as to details and sales techniques pertinent to my interests. (The live person is there to add a more natural effect to the sales experience. His voice quality is translated in real-time to the voice selected for my ears - a female voice.) I like the item and the deal, so a contract appears on my smart phone for identity verification, and I electronically sign it, buying <whatever>.

4.      [augmented experiences in new settings] I'm driving down the highway and I go past a series of billboards. In a Burma Shave manner, the successive displays build on each other to provide an experience that depicts famine in East Africa. A narrator appears on the screen on my dashboard, and tells me the details. Eventually, I'm asked to donate to the Hunger Relief Fund. I hit "5" on my dashboard to signify a donation of $50.

5.      [complex experiences involving multiple ad-hoc users] I'm walking around Paris, France. I've never been here before. My cell phone SIM has ID'd me to the augmented reality programs which I'm running on my personal display device (e.g. tablet, smart phone, iglasses.) My personal agent is giving me a tour. As I stop in front of intelligent signage in various windows, the signs know me and know what I might be interested in. I have a natural interaction with my agent and the sign(s). I come to a famous spot where Napoleon once stood. I hesitate, taking it in, and a 3D re-enactment of an historical event is superimposed over my view of the spot. For fun, the enactment involves me in the action. I learn firsthand what it was like to talk to Napoleon as a commoner. Meanwhile, all the smart signs around the spot are reacting to my experience, trying to lure me into their shop.

The things that are important to me about these examples (which I hope are adequate) are:

·        Code running different apps on the same or different devices need to be synchronized with my overall experience

·        The content being synchronized for an experience doesn't have to be video or audio in format; it could include anything

·        Any type of content should not need to be any one format e.g. video as MPEGx

·        The synchronized content can be live or prerecorded

·        The clock can be local to a master device, the network, or relative to a succession of event

·        Events which can impact the timing of the experience can include those generated by interaction with a live subject

·        Provisions are made to allow for delay of content and network impacts, e.g. to request pre-caching

·        Some way for all components to be aware of the state of any other component, as needed for the operation of the experience

·        Overall, the effect should be to:

o   Create a virtual environment over any number of windows, applications or devices

o   Tie the experience environment together using any connectivity available

o   Create a natural experience for the user

I think it's great if HTML5 can converge the existing TV and Web experiences. I think it's also important to support emerging content experiences so that HTML5 isn't antiquated immediately.

Thanks!
-Paul

Q me<qto://talk/pg2483>


From: Thomas Stockhammer [mailto:stockhammer@nomor.de]
Sent: Friday, September 16, 2011 11:56 AM
To: GAUSMAN, PAUL
Cc: Robert Pearson; Silvia Pfeiffer; public-web-and-tv@w3.org
Subject: Re: Web and TV Accessibility

Paul,

this is exactly what MPEG DASH provides.

The MPD collects and describes different media tracks (called Representations) that are all time-aligned in presentation time. Time-aligned is for the purpose of seamless switching as well as for synchronized presentation. What may be missing and can be done easily is a way to refer to Representations (or better Adaptation Sets, which are a collection of seamlessly switchable Representations) by providing some anchors in the MPD and refer to the MPD with an HTTP-URL and some fragment identifiers pointing to the anchor.

Thomas



On Sep 15, 2011, at 8:30 PM, GAUSMAN, PAUL wrote:


Everyone,

Excuse me if I'm not using this email correctly. I'll keep my comment short.

It seems that something which HTML5 should provide, now that it includes temporal media e.g. audio and video, is temporal controls, like mechanisms to synchronize media across a page, across apps or across devices. I don't see a way to do this within HTML5 proper.

Does anyone have any suggestions how this would be done using only HTML5?

If I should submit this another way, please let me know and I'll do that.

Thanks!
-Paul

Paul Gausman, Multimedia Service Architect
Ecosystem & Innovation
AT&T Applications and Services Infrastructure
908-848-5435
"Don't text and drive!"

From: public-web-and-tv-request@w3.org<mailto:public-web-and-tv-request@w3.org> [mailto:public-web-and-tv-request@w3.org] On Behalf Of Robert Pearson
Sent: Thursday, September 15, 2011 7:30 AM
To: Silvia Pfeiffer
Cc: public-web-and-tv@w3.org<mailto:public-web-and-tv@w3.org>
Subject: Re: Web and TV Accessibility

H Silvia,

Certainly, I think all if the required structures have been considered and in several cases, overall the accessibility of television will be enhanced with HTML 5 over standard TV with things like extended audio description and sign language tracks.

Two questions come to mind.

- Were there considerations for the protection and security of copyright media content when displayed using HTML 5?
- Quality Standards.  This may have been beyond the realm of consideration for the group, but while the structures are there, what standards would indicate the quality of the audio description or closed captioning and would they be different for TV on the web than for regular TV?  An example, how would 3d content be described or captioned for the web or other device if it was originally created to be viewed on a 3d TV screen?

I look forward to hearing your thoughts.

Regards,
Robert Pearson


On 2011-09-15, at 1:32 AM, "Silvia Pfeiffer" <silviapfeiffer1@gmail.com<mailto:silviapfeiffer1@gmail.com>> wrote:


Are you aware of the <track> element in HTML5 and all the different kinds of timed text tracks it can provide to video? These include captions, subtitles, descriptions, and chapters (as in: navigation), as well as multitrack audio and video support (for audio descriptions and sign language). Also, there was a media group in the accessibility task force of HTML5 which specified a requirements list, seehttp://www.w3.org/WAI/PF/HTML/wiki/Media_Accessibility_Requirements . Is there any requirement you have that is not yet considered?

Regards,
Silvia.

________________________________
This email, including any attachments, is intended only for the person(s) to whom it is addressed and may contain confidential and/or privileged information. Any unauthorized distribution, disclosure or use is prohibited. If you are not the intended recipient of this email, please contact the sender and delete this email, including any attachments, immediately. Thank you.

---
Dr. Thomas Stockhammer (CEO) || stockhammer@nomor.de<mailto:stockhammer@nomor.de> || phone +49 89 978980 02 || cell +491725702667 || http://www.nomor-research.com
Nomor Research GmbH  -  Sitz der Gesellschaft: München - Registergericht: München, HRB 165856 - Umsatzsteuer-ID: DE238047637 - Geschäftsführer: Dr. Thomas Stockhammer, Dr. Ingo Viering.
Received on Friday, 16 September 2011 20:53:51 UTC