Re: Draft of Second Screen Presentation Working Group Charter available (was: Heads-Up: Plan for Working Group on Second Screen Presentation) from Mark Scott on 2014-05-21 (public-webscreens@w3.org from May 2014)

From: Mark Scott <markdavidscott@google.com>
Date: Wed, 21 May 2014 12:14:41 -0700
To: "Rottsches, Dominik" <dominik.rottsches@intel.com>
Cc: "public-webscreens@w3.org" <public-webscreens@w3.org>, Mark Watson <watsonm@netflix.com>, "Kostiainen, Anssi" <anssi.kostiainen@intel.com>, "Bassbouss, Louay" <louay.bassbouss@fokus.fraunhofer.de>, "mark a. foltz" <mfoltz@google.com>, Philipp Hoschka <ph@w3.org>, Daniel Davis <ddavis@w3.org>
Message-ID: <CAAYfZWAq6Y_Es-Xs=6KvsdYQ=qfLzDYuc3uf5N_X8f4oRrODKA@mail.gmail.com>
Hey Dominik (and others),

I think there's three key sub-topics here.

1.  Content-type negotiation.

This seems necessary even ignoring app-specific content.  As you note, this
is messy on the web today, which it makes it particularly messy in a two-UA
case.  If we don't address this, we may happily load a web page on a remote
UA, only to later discover that it can't actually play the content it was
invoked to handle.

I agree that we need to be cautious about how much of this we take on, but
even at the most basic level we need to be able to differentiate devices
that might be audio-only (e.g. speakers), image-only (e.g. picture frames),
audio/video, etc.

2.  App-specific content.

The simplest concrete example of this is allowing Netflix or YouTube to
play a video on a Smart TV that supports a protocol like DIAL.  However,
the user flow - and perhaps even the web page itself - are identical to
what we've already discussed.  If a UA supports multiple approaches, when
it's asked to present "http://youtube.com/watch?v=1234", on device A it
might use a general-purpose HTML rendering engine to load that specific
URL.  On device B that uses DIAL, it might launch the YouTube app (known to
handle http://youtube.com/* rendering), and pass "v=1234" as a parameter on
that launch.  In both cases, the outcome is the same - a session that's
accessible through the presentation API that
supports YouTube-defined postMessage/onMessage semantics to the established
session.

An option proposed in MarkF's E-mail was to support an alternative URL
scheme (e.g. "dial://youtube/...").  This requires less mapping logic to be
built into the UA, and seems worth considering.  I don't think we assumed
that the Presentation API would define or mandate support for any
particular scheme, just that it doesn't prevent UAs from implementing such
schemes.

3.  Device reach.

>From a practical perspective, of the devices that developers will want to
target, a relatively small percentage support receiving a real-time video
stream, and an even smaller percentage today support an arbitrary UA.
 Specifically, Miracast/Wi-Di receivers, Apple TV, and Chromecast support
video streaming - and *no existing device* supports the two-UA model with
arbitrary URLs.

By contrast, the number of devices that support constrained URLs (i.e.
those handles by apps that are already "installed") is vastly larger.
 MarkW said millions but forecasts for Smart TVs are into the 100s of
millions; game consoles with basic support for DIAL are also above 100M.

Considering that developers care significantly about the range of devices
they can address, ensuring that we don't rule out the ability for a UA to
enable communication with legacy devices seems important.



On Wed, May 21, 2014 at 4:28 AM, Rottsches, Dominik <
dominik.rottsches@intel.com> wrote:

> Hi MarkS, MarkW,
>
> On 21 May 2014, at 11:43, Mark Scott <markdavidscott@google.com<mailto:
> markdavidscott@google.com>> wrote:
>
> Dominik, I agree that exposure of specific protocols via the Presentation
> API isn't a goal, and I don't think that's necessarily what MarkW was
> looking for.
>
> Rather, if we generalize the three use cases that MarkF raised (and I
> think there are strong valid arguments for all three), I think the high
> level goal is to handle a wide range of content types - HTML content, web
> media content, or app-specific content (e.g. a piece of content on
> Netflix).  The role of the presentation API at it's core is to find screens
> that support a particular content type, and to establish (or terminate) a
> presentation session on a appropriate screen for that content.
>
> I don’t think the generalisation to "any content type” is useful. At it’s
> core Presentation API should serve to find screens that can show the
> content that a web application developer can generate, this is all content
> that is understood by web browsers. And yes, that includes video (whether
> wrapped in a .html page or as a URL to say an MP4 file directly).
>
> Could you repeat or explain the use case for app-specific content? I may
> not have fully understood it. Ideally with a concrete user experience flow,
> and distinguishing it from something that couldn’t be done by modelling it
> as web content.
>
> The problem I see: content-type based compatibility negotiation is messy
> at best - it failed for video elements, it doesn’t work reliably work in
> HTTP headers for even distinguishing between text and html. The community
> even came up with a spec to distinguish files by in the end having to sniff
> content: http://mimesniff.spec.whatwg.org/
> - canPlayType() for video elements is hard to make sense of, if not
> completely broken. How could we assume we will do a better job at such
> content/type based compatibility detection?
>
> Where to draw the line? How to identify a compatible app on the receiving
> side? What kind of protocol part of the URI to use? Etc.
>
> We gain speed of implementation, and save ourselves from a lot of spec
> arguing by starting with: The remote side should understand web content and
> speak postMessage/onMessage with JS(ON) objects. If the remote side does
> not understand web content, and does not speak JavaScript, the UA simulates
> it by generating compatible output formats for the remote side, and takes
> care of “understanding” web content locally. In most cases, that would mean
> rendering to an offscreen tab and then sending out a video stream. I would
> claim this is the common denominator among most of such nearby screen
> “receivers” and we would reach a wide range of devices with such an
> approach.
>
> I think we would win a lot by making this an initial goal. Once this works
> and is deployed, I wouldn’t object to considering app-specific extensions,
> still keeping in mind the above compatibility detection nightmares.
>
> A view that underlies both MarkF and MarkW's view, which I share (must be
> a first name thing), is that messages/control in the context of an
> established presentation session is specific to the content being
> presented.  Whether you load Netflix-defined HTML content on a remote HTML
> UA, or a pre-installed Netflix app on a DIAL device, messages within the
> session are entirely Netflix-defined regardless.
>
> The semantics and higher level format of the messages is at least specific
> to a particular web-app, I would agree - but thinking about a technical
> realisation: Would you want to make the communication between the UA and
> the remote end a buffered binary protocol? Using for example an ArrayBuffer
> and start sending binary, possibly proprietary protocols? I have doubts
> that this helps adoption of the API as something easy to use, as opposed to
> making at least the basic protocol postMessage / onMessage based with JS
> data objects passed over the line - something that is already conceptually
> established in the context of Web Workers.
>
> Dominik
>
Received on Wednesday, 21 May 2014 19:15:19 UTC