[Widgets] WARP LC comments from Marcos Caceres on 2009-08-06 (public-webapps@w3.org from July to September 2009)

From: Marcos Caceres <marcosc@opera.com>
Date: Fri, 7 Aug 2009 01:24:07 +0200
To: public-webapps <public-webapps@w3.org>, Robin Berjon <robin@berjon.com>
Message-ID: <b21a10670908061624v56b3c5a7nf3bdacc670bebb64@mail.gmail.com>
Hi Robin,
Thanks for putting this spec together... I copy/pasted the spec below
and added inline comments.

Enjoy! :)

On Thu, Aug 6, 2009 at 9:53 PM, Marcos Caceres<marcosc@opera.com> wrote:
> 1 Introduction
>
> User agents running widgets are expected to provide access to
> potentially sensitive APIs (phone book, calendar, file system, etc.)
> that expose data which should not be leaked to arbitrary network
> locations without the user's consent.
>

The above is all  true. But this specification does not protect
against "API leakage"; I find the above overstretching a bit. I think
the scope of this specification with regards to <feature> is
underspecified. Either we say nothing about feature, or we need to
talk about this some more in the WG.

> The purpose of this specification is precisely to define the security

maybe "is to precisely define", though I think precisely is redundant.

> model for network interactions from within a widget that has access to
> sensitive information, and to provide means for a widget to declare
> the need to access specific network resources so that a policy may
> control it.

The above paragraph is a single sentence that conveys multiple ideas.
Please break it up into multiple sentences.

> 1.1 Definitions
>
> An access request is a request made by an author in the configuration
> file

configuration file > configuration document (check globally)

> for the ability to retrieve one or more network resources
> identified via the access element's uri and subdomains attributes.
>

I would rewrite as:

An access request is a request made by an author to the user agent for
the ability to retrieve one or more network resources. The network
resources and author requests to access are identified via the access
element's URI and subdomain attribute, within the widget's
configuration document.

> To grant access means that the user agent authorises widget execution
> scopes to retrieve one or more network resources via the user agent.

"Widget execution scope" is <dfn>'d but not actually defined in this
definition. I can only assume this means stylesheets, script elements,
etc. However, implementers that have not been taking part in the
discussion may incorrectly assume that the HTML5 "inline content"
rules still apply (i.e., cross-origin is allowed for images, iframe,
scripts, etc. is allowed in widgets, as it is on the Web.... which is
not the case for Widgets).


> Note that some schemes (e.g. mailto:) may be handled by third-party
> applications and are therefore not controlled by the access mechanism
> defined in this specification.

This note is marked up to use RFC2119 terminology (please remove MAY).
Also, the note is not marked up as a Note:.

> To deny access is to refuse to grant access.

Please rewrite as: To deny access means that the user agent rejects an
author's request to grant access.

> A network resource is a retrievable resource of any type that is

Maybe say "any content type" or mime type?

> identified by a URI that has a DNS or IP as its authority.
>
> A feature-enabled API is an API that is for one reason or another
> considered to be sensitive (typically because it has access to the

missing "is" ("is considered")

> user's private data). It may be that this API can also be activated in
> a broader web context (e.g. through user interaction, prompting,
> etc.);

> but here we are considering the case where it was activated
> based on processing the <feature> element in the widget's
> configuration file as per the Widgets 1.0: Packaging and Configuration
> specification [Widgets-PC]

"We" is weird above. .Also, configuration file > configuration document.

I kinda get the above, but not really. The text needs to be tightened
up a little bit.

> The widget execution scope is the scope (or set of scopes, seen as a
> single one for simplicity's sake) being the execution context for code
> running from documents that are part of the widget package.

Oh, ok. The first dfn of widget execution scopes should be a link to
this one. Still, this definition does not cover style sheets and
inline context. Is that correct?

> The web execution scope is the scope (or set thereof) being the
> execution context for code running from documents that have been
> loaded off the web.

Maybe leech some HTML5 terminology here?

> 1.2 The Widget Family of Specifications
>
> This section is non-normative.
>
> This specification is part of the Widgets 1.0 family of
> specifications, which together standardise widgets as a whole. The
> Widgets 1.0: APIs and Events [Widgets-APIs] specification defines APIs
> to store preferences and capture events. The Widgets 1.0: Digital
> Signature [Widgets-DigSig] specification defines a means for widgets
> to be digitally signed using a custom profile of the XML-Signature
> Syntax and Processing Specification. The Widgets: 1.0: Automatic
> Updates [Widgets-Updates] specification defines a version control
> model that allows widgets to be kept up-to-date over [HTTP].

I rewrote this boilerplate for the P&C spec. Now uses a nice list,
etc. Please use that one.

> 1.3 Design Goals and Requirements
>
> This section is non-normative.
>
> The design goals and requirements for this specification are captured
> in the Widgets 1.0 Requirements [Widgets-Reqs] document. This document
> addresses the following requirements:

During the teleconf with the Director to progress P&C to CR, Ralf
asked me to tighten up the above text indicating precisely which date
version of the requirements are being met by this draft. Please copy
the text from P&C for the above.

>    * Default Security Policy: see Security model.
>    * Security Declarations: see the access element.
>
> Additionally, the following requirements are taken into account:
> Restricted access to remote web resources
>
> Motivation:
>    Security, Current development practice or industry best- practice,
> Interoperability.
> Rationale:
>
>    A Widget may need to make use of external web services in order to
> function, for example using AJAX to obtain information.
>
>    A User Agent may wish to restrict access to external web services
> from Widgets based on white lists or black lists, for example using a
> proxy server or firewall.
>
>    This raises the possibility for users installing Widgets that are
> unable to function due to access restrictions on remote web services.
>
>    By providing a mechanism for declaring a Widget's access
> requirements, the usability and interoperability of Widgets can be
> improved.

This statement is unfounded/speculation without a citation.

>    For example, when a user attempts to install a Widget in a User
> Agent, and the Widget Configuration Document declares that it requires
> access to currently blocked services in order to function, the User
> Agent may prompt the user to choose to:
>
>       1. enable access to the service (for example, adding the
> service to a proxy server or firewall white list),
>       2. cancel installing the Widget, or
>       3. proceed with installation, with the user now aware that
> there may be problems with the Widget due to restricted access to
> services.
>
> Additional considerations guiding this specification are maximal
> compatibility with existing web technology (including not breaking
> linking to JS libraries, embedded media, ads, etc.); and not
> restricting the platform in such a way that would make it less
> powerful than the web platform.

All this should be moved out of this spec and into the requirements
document. The requirements document can then be republished at the
same time as this spec.

> 2 Policy
>
> A widget runs in its own widget execution scope. Communication between
> that execution scope and the network is prohibited by default, but may
> be turned on selectively using the access element.

The above needs to be in active voice. Also, the user agent is doing
this in operation, so I would recommend adding a MUST in there:

A user agent must prohibit communication between that execution scope
and the network is prohibited by default,. A user agent may grant
access requests selectively using the access element.

The word "selectively" bothers me, as it implies some kind of
artificial intelligence.

> This prohibition
> must apply equally to access through APIs (e.g. XMLHttpRequest) or
> through inlined content (e.g. iframe, script, img).

"This prohibition" doesn't make sense there, which prohibition. Make
the above either a statement of fact (drop the must), or convert it to
a testable assertion (A user agent must..).

> Scripts executing in that widget execution scope have access to
> feature-enabled APIs. Note that other mechanisms may provide access to
> the same APIs in other contexts, but that that is outside the scope of
> this specification.

Again, an untestable assertion and you use the word Note where it is
not marked up as a  "note". This note also uses RFC2119 terminology
yet does not apply to a product. I suggest dropping "may" (use "can")
instead and making this into an actual note.

Whoa! I just realized that this spec lacks a Conformance section.
Wooopsy! :)  We don't have any products defined that can conform to
this spec?

> When permission is selectively turned on to access a given set of
> network resources, it must be granted equally to APIs and inlined
> content.

How is a permission selectively turned on? via a authorized grant request?

The above is in passive voice, please rewrite in active voice.

Inline content should be formally defined (or reference HTML5).

> The exact rules defining which execution scope applies to network
> resources loaded into a document running in the widget execution scope
> depend on the language that is being used inside the the widget.

Break here. Mark up the following as an example:

> For
> instance, in HTML 5 [HTML5] a script loaded off the network into a
> document running in the widget execution scope is itself in the same
> scope,

Full stop.

> whereas a document loaded off the network in an iframe will be
> in the web execution scope.

I probably already said this, but web execution scope should just be
defined in terms of HTML5.

> 3 The access Element
>
> The access element allows authors to request permission from the user
> agent to retrieve a set of network resources.

You should mention the configuration document here.

> A user agent must prevent the widget execution scope from retrieving
> network resources, using any method (API, linking, etc.) and for any
> operation, unless the user agent has granted access to an explicitly
> declared access request.

The above is nice, but it is out of context here.

> However, a user agent may grant access to certain URI schemes (e.g.,
> mailto:) without the need of an access request if its security policy
> considers those schemes benign.

Ok.

> A user agent may deny access requests
> made via the access element (e.g. based on a security policy, user
> prompting, etc.).

"deny access requests" links to access requests, link deny to "deny
access" too.

> The access element is in the http://www.w3.org/ns/widgets namespace.

"... as defined in the P&C spec".

> Context in which this element may be used:
>    As a child of the widget element.
> Content model:
>    Empty.
> Occurrences:
>    Zero or more.
> Expected children:
>    none.
> Localizable via xml:lang:
>    No.

All good.

> Attributes
>
> uri
>    Required.

Required what? :)  Change to authoring guideline, as per P&C please.

> A URI attribute that defines the specifics of the access

What's a uri attribute ?  (missing link to P&C)

> request that is requested. Additionally, the special value of U+002A
> ASTERISK (*) may be used.

Passive voice and use of RFC2119 for a non-testable assertion. Change
to a statement of fact and add an authoring guideline about "*".
Actually, just remove the last sentence and change the following.

> This special value provides a means for an

This special value > The reserved value of U+002A ASTERISK (*)

> author to request from the user agent unrestricted access to resources
> through any and all schemes and protocols supported by the user agent.

Maybe add a note here saying that user agents are under no obligation
to grant such preposterous requests :)

> subdomains
>    Optional. A boolean attribute that indicates whether or not the

What's a boolean attr? missing link to P&C for external reference.

> host component part of the access request applies to subdomains of
> domain in the uri attribute. The default value when this attribute is
> absent is false, meaning that access to subdomains is not requested.
>
> Usage example
>
> This example shows the expected usage of the access element.
>
> <widget
>  xmlns ="http://www.w3.org/ns/widgets"
>  width ="400"
>  height="500">
>  <name>Flight Tracker</name>
>  <access uri="http://example.com/api/"/>
>  <access uri="https://example.net"/>
>  <access uri="http://example.org" subdomains="true"/>
>  <access uri="http://example.com/dahut?bar"/>

Phew! I was about to say that mentioning a "dahut" would make this
example vastly more comprehensible. Thankfully, it is there.

>  <access uri="*"/>
> </widget>
>
> 4 Processing model

Here you need to say that this happens as part of Step 7's algorithm
to process a configuration document, in P&C.

> Let access-request list be an empty list of objects that represent the
> author's access requests to network resources.
>
> Note: The following sequence of steps relies on terminology that is
> defined in RFC 3987 [RFC3987] and in the URI [URI] specification. The
> particular the terms derived from the IRIs specification include:
> ifragment , ipath, iuser info, and iquery.

You also use terminology from P&C, e.g., ignore. rule for getting a
single attribute value, etc.

> For each access element that is a direct child of the widget element:
>
> If the uri attribute is absent, then this element is in error and the
> user agent must ignore it.
>
> Let uri be result of applying the rule for getting a single attribute
> value to the value of the uri attribute. If the result is a single
> U+002A ASTERISK (*) character, then prepend the a U+002A ASTERISK to
> the access-request list and skip all steps below.

Would it not be better to collect all, as the UA might choose to
ignore this access request, but grant more granular ones?

> If uri is not a valid URI, if it has no host component, or if it has a
> iuser info component, then this element is in error and the user agent
> must ignore it.


> Let sub domains be the result of applying the rule for getting a
> single attribute value to the value of the subdomains attribute. If
> the value of sub domains is not a valid boolean value, then this
> element is in error and the user agent must ignore it.

This is inconsistent with the behavior of boolean attributes, but I
can live with this. It is inconsistent as it should default to "false"
if the value of subdomain is in error.

> If uri has an ifragment component, remove it from uri. Let scheme be
> the scheme component of uri. Let host be the host component of uri.
> Let port be the port component of uri or if there is no port component
> the default value for the protocol that corresponds to scheme. Let
> path be the ipath component of uri concatenated to the iquery
> component of uri.
>
> If scheme is unsupported by the user agent, then this element is in
> error and the user agent must ignore it.
>
> If scheme is "http" or "https", then the value of host must be
> processed

Convert to active voice.

> using the ToASCII algorithm as per [RFC3490], then decode,
> as per [URI], all percent-encoded parts of path that are unreserved
> characters.

Then store all these access requests in the configuration defaults
table, please.

> Rule for Granting Access to a URI identifiable Resources

This rule is never applied during processing. Maybe it should be
applied to derive the overall access scope?

> When multiple access elements are used, the set of network connections
> that are allowed is the union of all the access requests that were
> granted by the user agent. The following rules are applied to

Applied by who? When? where? why? how?

> determine what each access element is requesting access to.

I don't think you should talk about access elements here, talk about
access requests. Access requests being the data derived by the user
agent from processing the XML serialization of a request for access by
the author.

>   1. The request for access made by the access element is for network
> resources that have:
>          * a scheme equal to scheme; and
>          * if subdomains is false, a host exactly equal to host; and
>          * if subdomains is true, a host either exactly equal to
> host, or that is a subdomain of host; and
>          * a port equal to port; and
>          * a path and query concatenation that is equal to path or
> begins with it.

what is "it"? The above is redundant. You already determined this in
the processing step.

> At runtime, when a network request is made from within the widget
> execution scope, the user agent matches it against the rules defined
> above, accepting it if it matches and blocking it if it doesn't.

Break. You have another note...

> Note
> that if scheme is "http" or "https", host comparisons must be
> performed in a case-insensitive manner.

You use the word must as part of a note and don't identify the product
to which it applies. Please make this a statement of fact or a
testable assertion.

> As a special case, the uri attribute may hold the value *.

As above, wrt "may"

>In that
> case, the access element is considered

Considered by who?

> to request access to all
> network resources without limitation (e.g. retrieve RSS feeds from
> anywhere). If access is granted

Granted by who?

> to such a request, then all other
> network access requests must be granted.

By who? please identify.

> Acknowledgements
>
> The editor would like to thank (in no particular order): the OMTP
> BONDI effort, Jere Kapyaho, Thomas Roessler, Art Barstow, Mohamed
> Zergaoui, Arve Bersvendsen, and Batman Caceres.

heh, nanananaaaa Batman! :)

> Normative References
>
> [RFC3987]
>    Internationalized Resource Identifiers (IRIs). RFC3987, M. Duerst,
> M. Suignard. January 2005.
> [URI]
>    Uniform Resource Identifier (URI): Generic Syntax. RFC 3986, T.
> Berners-Lee, R. Fielding and L. Masinter. January 2005.
> [RFC3490]
>    Internationalizing Domain Names in Applications (IDNA). RFC 3490,
> P. Faltstrom, et al. March 2003.
> [Widgets-PC]
>    Widgets 1.0: Packaging and Configuration, M. Caceres. W3C, December 2008.
>
> Informative References
>
> [Widgets-Reqs]
>    Widgets 1.0 Requirements, M. Caceres. W3C, April 2009.
> [Widgets-DigSig]
>    Widgets 1.0: Digital Signature. M. Caceres, . W3C, W3C Working
> Draft April 2009.
> [Widgets-Updates]
>    Widgets 1.0: Updates. M. Caceres. W3C Working Draft October 2008.
> [Widgets-APIs]
>    Widgets 1.0: APIs and Events. Arve Bersvendsen et al. W3C Working
> Draft 23 April 2009.
> [HTML5]
>    HTML 5. Ian Hickson et al. W3C Working Draft 23 April 2009.
> [HTTP]
>    Hypertext Transfer Protocol -- HTTP/1.1. RFC 2616, R. Fielding, et
> al. June 1999.
> [HTTPS]
>    HTTP Over TLS. RFC 2818, E. Rescorla. May 2000.
>

I didn't check the references, but I assume they are ok.


-- 
Marcos Caceres
http://datadriven.com.au
Received on Thursday, 6 August 2009 23:25:15 UTC