Re: ACTION-337: Review of access element from Thomas Roessler on 2009-05-08 (public-webapps@w3.org from April to June 2009)

From: Thomas Roessler <tlr@w3.org>
Date: Fri, 8 May 2009 15:52:48 +0200
To: Robin Berjon <robin@berjon.com>
Cc: public-webapps WG <public-webapps@w3.org>
Message-Id: <2B83BB39-1D81-47C1-ACF6-D42C14C67C4A@w3.org>
On 7 May 2009, at 13:47, Robin Berjon wrote:

> Hi Thomas,
>
> On May 2, 2009, at 13:31 , Thomas Roessler wrote:
>> 1. What does "access to network resources" mean?  Does this refer  
>> to the use of inline resources, stylesheets, images,  
>> XMLHttpRequest, form submissions, some of these, all of these?   
>> More precisely, does this apply to (a) causing GET requests (inline  
>> resources, stylesheets, ...), (b) reading the results of GET  
>> requests (XHR), (c) causing POST requests (forms, XHR)?
>
> It is any access to any resource that requires a network connection,  
> irrespective of the type of resource, the operation, etc. I'm  
> clarifying.

Following up on the discussion on yesterday's call, we have at least  
the following choices:

1. The HTML5 security model (as I'll call it by abuse of language)  
applies, and that includes access to inline resources.  Choosing a  
random origin also means that XMLHttpRequest needs additional  
authorization; that authorization could be granted through an access  
element.

2. The HTML5 security model, but with additional restrictions on  
network access.  In other words, *if* network access is permissible,  
then scripts and frames behave as they would in html5; if it isn't,  
external resources won't be loaded inline.

>
>> 2. The use of "URI" as an attribute name is misleading, since the  
>> value of that attribute is actually a pattern.
>
> We're switching to @pattern.
>
>> 3. The formal description of the attribute's value space is defined  
>> by reference to the valid URI token (or IRI token) productions in  
>> RFCs 3986 and 3987.  Works for me (TM).
>>
>> Unfortunately, some additional considerations apply for IRI  
>> references:  The mapping between arbitrary Unicode character  
>> sequences and A-labels ("xn--...") turns out to be sufficiently  
>> brittle that the only host name sequences you want to use are U- 
>> labels (the subset of non-ASCII labels for which ToUnicode and  
>> ToASCII round-trip).  Comparison of IDNs is defined on the level of  
>> the A-label ("xn--"), and shouldn't occur on the Unicode level.  
>> Take a look at the latest POWDER drafts for another WG that recent  
>> grappled with the problem.  Also, be clear what kinds of  
>> normalization is applied to the path and query string components  
>> before comparison.  How do you deal with % encoding?  (Again, see  
>> POWDER -- they're doing the right thing in their latest iteration.)
>
> I take it you're talking about POWDER Grouping? Is there a specific  
> section that you think we should find inspiration from (it hurts my  
> head a little...)? Would you recommend referencing it outright?

Sorry for the obscure reference.  Yes, I was talking about powder- 
grouping, but not the published Working Draft; I'll pony up a pointer.

Meanwhile, the important pieces:

- you want to % decode all unreserved characters (look up "unreserved"  
in RFC 3986)
- you want to generate the ASCII version of the host part of the IRI  
reference (i.e., the xn--... version)
- then, do an ASCII case insensitive comparison of the host part, and  
a character by character comparison of the rest

Additional consideration in here:  The above remarks work for the http  
and https URI schemes. They don't necessarily work for other schemes.

What's the plan of the scope of the pattern attribute for:

(a) additional schemes
(b) between http and https?


>> 4. How do you deal with trailing slashes?
>
> The path component is just a string — it has no structure. If if has  
> a trailing slash, then only access to paths that begin with that  
> path including its trailing slash is granted.

ok

>
>> 5. What is the use case for the wildcard mechanism?  As I noted  
>> before [*], the wildcard mechanism makes it fairly easy to scan  
>> large network segments by inventing host names on the fly.  I'd  
>> prefer to simply drop that mechanism for the moment and keep things  
>> really simple for v1.  If that's not an option, can we please  
>> define separate attribute names for patterns that imply access to  
>> the entire network and patterns that imply access to resources at a  
>> single host name only?
>
> The use case is many services (e.g. Google Maps) that serve from  
> unpredictable subdomains, like www17.example.com or  
> foo4.bar20.baz32.example.org.
>
> Is your proposal to have a separate attribute like  
> subdomains="true"? In some ways I see how it could be clearer, but I  
> don't really see how it changes the issue?

That's not precisely my proposal, but it would address the concern.
Received on Friday, 8 May 2009 13:52:58 UTC