Re: Draft of Second Screen Presentation Working Group Charter available (was: Heads-Up: Plan for Working Group on Second Screen Presentation)

Hi Dominik, responses inline.

On Wed, May 21, 2014 at 4:28 AM, Rottsches, Dominik <
dominik.rottsches@intel.com> wrote:

> Hi MarkS, MarkW,
>
> On 21 May 2014, at 11:43, Mark Scott <markdavidscott@google.com<mailto:
> markdavidscott@google.com>> wrote:
>
> Dominik, I agree that exposure of specific protocols via the Presentation
> API isn't a goal, and I don't think that's necessarily what MarkW was
> looking for.
>
> Rather, if we generalize the three use cases that MarkF raised (and I
> think there are strong valid arguments for all three), I think the high
> level goal is to handle a wide range of content types - HTML content, web
> media content, or app-specific content (e.g. a piece of content on
> Netflix).  The role of the presentation API at it's core is to find screens
> that support a particular content type, and to establish (or terminate) a
> presentation session on a appropriate screen for that content.
>
> I don’t think the generalisation to "any content type” is useful. At it’s
> core Presentation API should serve to find screens that can show the
> content that a web application developer can generate, this is all content
> that is understood by web browsers. And yes, that includes video (whether
> wrapped in a .html page or as a URL to say an MP4 file directly).
>
> Could you repeat or explain the use case for app-specific content? I may
> not have fully understood it. Ideally with a concrete user experience flow,
> and distinguishing it from something that couldn’t be done by modelling it
> as web content.
>

The use case for app specific content is an identifier for a piece of media
that e.g. YouTube controls that may be rendered by a Web document or
another agent attached to the remote screen.  In some cases the remote
agent would download and render a Web document the same way a browser would
do.  In some cases the remote agent would start an installed application
and provide the content identifier to it.  In both cases the remote agent
would conform to the Presentation API spec for messaging to and from the
controlling page and presumably understand the same control protocol (as
the page would rather not speak different protocols to different types of
remote agents).

What this enables is the availability to the Web of a large number (I would
say the majority) of Internet-connected screens for supported content, and
a better user experience as opposed to local rendering and trans-coding of
the content.

This could be handled a few different ways through the API, such as
extending it to support application-specific URNs and a way to map these
URNs to schemes for Web documents and installed apps, or by allowing the
browser to map HTML documents sent through the API to alternative
namespaces.  I would rather not debate the merits of specific approaches
yet.


>
> The problem I see: content-type based compatibility negotiation is messy
> at best - it failed for video elements, it doesn’t work reliably work in
> HTTP headers for even distinguishing between text and html. The community
> even came up with a spec to distinguish files by in the end having to sniff
> content: http://mimesniff.spec.whatwg.org/
> - canPlayType() for video elements is hard to make sense of, if not
> completely broken. How could we assume we will do a better job at such
> content/type based compatibility detection?
>
> Where to draw the line? How to identify a compatible app on the receiving
> side? What kind of protocol part of the URI to use? Etc.
>

I think content type negotiation is actually much simpler for case #2.
 Either the remote agent is capable of starting the application or not and
I would expect it to be able to answer that question in advance of a
presentation request.

I don't think we can easily avoid the complexities you bring up.  "Web
documents" incorporate all of the media codec complexity noted above, along
with other Web platform features that are not universally adopted or
require hardware support (WebGL, EME, MSE, WebRTC, webp, ...).  Either the
CG will be tasked with defining a subset of the Web platform that must be
supported for presentations (potentially crippling the spec into something
less than useful), or requiring the negotiation of these capabilities with
the remote agent.


> We gain speed of implementation, and save ourselves from a lot of spec
> arguing by starting with: The remote side should understand web content and
> speak postMessage/onMessage with JS(ON) objects. If the remote side does
> not understand web content, and does not speak JavaScript, the UA simulates
> it by generating compatible output formats for the remote side, and takes
> care of “understanding” web content locally. In most cases, that would mean
> rendering to an offscreen tab and then sending out a video stream. I would
> claim this is the common denominator among most of such nearby screen
> “receivers” and we would reach a wide range of devices with such an
> approach.
>

I would be curious to find out the market penetration of screens that
support this approach with a mechanism that is likely to be multiply
supported and interoperable.  Chromecast with WebRTC support comes to mind,
are there others?

I think we would win a lot by making this an initial goal. Once this works
> and is deployed, I wouldn’t object to considering app-specific extensions,
> still keeping in mind the above compatibility detection nightmares.
>

Presentation of HTML documents is certainly a goal that should remain in
scope.  I am not sure what we gain time wise by restricting it to this
case, in fact based on our implementation experience both local capture and
rendering, and remote rendering of arbitrary documents are significantly
more complex than the "flinging" case of a well defined subset of content.
 For the latter all that is required is the support for the application
control protocol (i.e., DIAL) and the messaging channel.


>
> A view that underlies both MarkF and MarkW's view, which I share (must be
> a first name thing), is that messages/control in the context of an
> established presentation session is specific to the content being
> presented.  Whether you load Netflix-defined HTML content on a remote HTML
> UA, or a pre-installed Netflix app on a DIAL device, messages within the
> session are entirely Netflix-defined regardless.
>
> The semantics and higher level format of the messages is at least specific
> to a particular web-app, I would agree - but thinking about a technical
> realisation: Would you want to make the communication between the UA and
> the remote end a buffered binary protocol? Using for example an ArrayBuffer
> and start sending binary, possibly proprietary protocols? I have doubts
> that this helps adoption of the API as something easy to use, as opposed to
> making at least the basic protocol postMessage / onMessage based with JS
> data objects passed over the line - something that is already conceptually
> established in the context of Web Workers.
>

>From my point of view, support for binary formats over the messaging
channel would be nice to have but optional.  It would certainly help cases
where the controlling page is generating media and wants to send it
efficiently encoded to the presenting app/page.  We also have
RTCDataChannel for this use case where a low latency and high bandwidth
channel is desired.


>
> Dominik
>

Received on Thursday, 29 May 2014 00:43:34 UTC