Re: Updates to File API from Arun Ranganathan on 2010-06-22 (public-device-apis@w3.org from June 2010)

From: Arun Ranganathan <arun@mozilla.com>
Date: Tue, 22 Jun 2010 15:37:18 -0700
To: Adrian Bateman <adrianba@microsoft.com>
CC: Jonas Sicking <jonas@sicking.cc>, Jian Li <jianli@chromium.org>, Web Applications Working Group WG <public-webapps@w3.org>, public-device-apis <public-device-apis@w3.org>
Message-ID: <4C213B1E.7090403@mozilla.com>

On 6/22/10 8:44 AM, Adrian Bateman wrote:
> On Friday, June 11, 2010 11:18 AM, Jonas Sicking wrote:
>    
>> On Fri, Jun 11, 2010 at 11:11 AM, Jonas Sicking<jonas@sicking.cc>  wrote:
>>      
>>> On Fri, Jun 11, 2010 at 9:09 AM, Adrian Bateman<adrianba@microsoft.com>
>>>        
>>>> It's not clear to me the benefit of encoding the origin into the URL. Do
>>>> we expect script to parse out the origin and use it? Even in a multi-process
>>>> architecture there's presumably some central store of issued URLs which will
>>>> need to store origin information as well as other things?
>>>>          
>>> The one advantage I can see is that putting the scheme into the URL
>>> allows the *implementation* to deduce the origin by simply looking at
>>> the URL-scheme. This avoids having to do a (potentially cross-process)
>>> lookup to get the origin.
>>>
>>> This could be useful for APIs which have to synchronously determine
>>> the origin of a given URL in order to throw an exception on an
>>> attempted cross-origin access. For example an XMLHttpRequest Level 1
>>> implementation needs to synchronously determine if it should make a
>>> call to .open(...) throw or not based on the origin of the passed in
>>> URL.
>>>
>>> However I'm not sure if this is a problem in practice or not. It's
>>> entierly possible that the web platform is littered with situations
>>> where you need to do synchronous communication with whichever thread
>>> the networking code runs on.
>>>
>>> Firefox is still in the process of going multi-process, so I'll defer
>>> to other browsers with more experience in this area.
>>>        
>> Oh, and I should add that the implementation will of course still have
>> to check once a url is loaded that the origin in the url matches the
>> origin in whatever map is used to map urls to resources. I.e. if the
>> implementation has handed out a url like:
>>
>> filedata:sheep.org/3699b4a0-e43e-4cec-b87b-82b6f83dd752
>>
>> and script changes that to:
>>
>> filedata:wolf.org/3699b4a0-e43e-4cec-b87b-82b6f83dd752
>>
>> then attempting to load the latter url should result in a 404 or similar.
>>      
> Since the origin requires scheme as well as hostname/port it seems like we'll
> end up with some encoding or parsing complexity by following this approach.

Upon reflection, I agree with Adrian.  Origin requires:

1. Scheme
2. Hostname
3. Port
4. Certificates, if any

This creates untenable complexity.

> Robin
> gave good reasons for not allowing user agents to encode data into the URL
> and I'm not convinced that including origin for this particular case isn't
> a premature optimisation. At what point will we find other data that's
> convenient to have encoded in the URL?
>    

+1.
> I think it makes more sense for the URL to be opaque and let user agents figure
> out the optimal way of implementing origin and other checks.
>    

I think it may be important to define:

* Format.  I agree that this could be something simple, but it should be 
defined.  By opaque, do you mean undefined?
* Behavior with GET.  For this, I propose using a subset of HTTP/1.1 
responses.

-- A*

Received on Tuesday, 22 June 2010 22:38:08 UTC