Re: XHR LC comments

On Sun, 04 May 2008 11:47:13 +0200, Julian Reschke <julian.reschke@gmx.de>  
wrote:
> Review of <http://www.w3.org/TR/2008/WD-XMLHttpRequest-20080415/>.
>
> General points:
>
> a) I'm confused about the approach to this document. On the one hand,  
> we're being told that it can't define anything not already in use (and  
> that new stuff belongs into XHR2), on the other hand it relies on HTML5,  
> which is a moving target. It's good that this stuff is being written  
> down, but if it relies on HTML5, I'd propose to consider other  
> publication options.

The problem is that concepts such "origin" and determining the encoding of  
a text/html stream are not defined anywhere else. It's not really clear to  
me what to do about that.


> b) Algorithms: the spec uses a method to describe algorithms that IMHO  
> is extremely hard to read (see for instance send() method). This may be  
> good for implementors, but seems to be bad for everybody else.  
> Minimally, the lists should be structured for better readability.

Could you elaborate on what kind of change you envision? I'm not sure how  
they are not structured right now.


> c) Structure: It would be nice if Section 4 had more structure. Right  
> now it's ugly to navigate and refer to.

This is better in XMLHttpRequest Level 2. I rather not revise that entire  
section editorially as it might introduce new errors.


> 2.1 Dependencies
>
> "DOM
>
>      A conforming user agent must support some subset of the  
> functionality defined in DOM Events and DOM Core that this specification  
> relies upon. [DOM2Events] [DOM3Core]"
>
> That reads a bit strange. Must the subset be non-empty?

Yes, as stated it must be a subset that matches what XMLHttpRequest  
requires from the eventing and core specifications.


> 2.2 Terminology
>
> "Two URIs are same-origin if after performing scheme-based normalization  
> on both URIs as described in section 5.3.3 of RFC 3987 the scheme, ihost  
> and port components are identical. If either URI does not have an ihost  
> component the URIs must not  be considered same-origin. [RFC3987]"
>
> Why are we referring to the IRI spec (RFC3987) when talking about URIs,  
> as defined RFC3986?

For scheme-bases normalization and ihost. Maybe I should use IRI instead  
of URI?


> 3. Security Considerations
>
> "Apart from requirements affecting security made throughout this  
> specification implementations may, at their discretion, not expose  
> certain headers, such as HttpOnly cookies."
>
> "...such as headers containing HttpOnly cookies".

Done.


> Besides that: this may be a non-optimal example unless we can point to a  
> definition of "HttpOnly cookies". Can we?

I don't believe we can, but since this was put in mostly for HttpOnly  
cookies I rather not remove that. I think it will be clear enough for  
people reading the document.


> 4. The XMLHttpRequest Object
>
> "onreadystatechange of type EventListener
>
>      This attribute is an event handler DOM attribute and must be  
> invoked whenever a readystatechange event is targated at the object."
>
> s/targated/targeted/

Already fixed.


> "If stored method case-insensitively matches  CONNECT, DELETE, GET,  
> HEAD, OPTIONS POST, PUT, TRACE, or TRACK let stored method be the  
> canonical uppercase form of the matched method name."
>
> - missing comma after OPTIONS

Fixed.


> - TRACK??? There's probably a rational for that. If there is, please  
> include it in the spec.

It's a security issue, as should be clear from the next bullet point.


> "If the user argument was not omitted and is not null let stored user be  
> user  encoded using the encoding specified in the relevant  
> authentication scheme or UTF-8 if the scheme fails to specify an  
> encoding."
>
> Why is XHR talking about the encoding here? Is "stored user" a string or  
> a byte array?
>
> (same for password)

They're a string (in the API).


> "Abort the send() algorithm, set response entity body to "null" and  
> reset the list of request headers."
>
> This is a very confusing statement, until one realizes that it's in the  
> description of "open", not "send". It would be good to rephrase this so  
> it becomes clear that this aborts a *previous* send request.

Added a note.


> "If the value argument is null terminate these steps. (Do not raise an  
> exception.)."
>
> This makes it impossible to set empty headers, which are allowed in  
> HTTP. Even worse, it silently fails.

Empty headers can be set using the empty string, no? Not raising an  
exception is consistent with implementations and I don't think it matters  
much as it doesn't have any effect.


> This is profiling of the base spec, and I don't see any compelling  
> reason to do so. Don't do it.

How is it profiling?


> "For security reasons, these steps should be terminated if the header  
> argument case-insensitively matches one of the following headers:
>
>      * Accept-Charset
>      * Accept-Encoding
>      * Connection
>      * Content-Length
>      * Content-Transfer-Encoding
>      * Date
>      * Expect
>      * Host
>      * Keep-Alive
>      * Referer
>      * TE
>      * Trailer
>      * Transfer-Encoding
>      * Upgrade
>      * Via "
>
> It's unclear why there's a security reason not to allow things like  
> "Accept-Charset" or "Accept-Encoding". Please explain.

This was done based on implementation feedback. I haven't investigated  
what the reasons were for the various headers. If implementors read this  
maybe they could chime in and point it out.


> General comment on "setRequestHeader(header, value), method": the way it  
> is specified makes it impossible for a client to reliably set headers.  
> We need a way to either retrieve the current value for inspection, or a  
> way to reset the header. Or both.

http://lists.w3.org/Archives/Public/public-webapi/2008May/0139.html


> "If stored method is GET act as if the data argument is null."
>
> Another case of HTTP/1.1 being profiled. Don't do it.

This was done on request of implementations.


> "Serialize data into a namespace well-formed XML document and encoded  
> using the encoding given by data.inputEncoding, when not null, or UTF-8  
> otherwise. Or, if this fails because the Document cannot be serialized  
> act as if data  is null."
>
> Silent failure????

Yes.


> "If no Content-Type header has been set using setRequestHeader() append  
> a Content-Type header to the list of request headers with a value of  
> application/xml;charset=charset  where charset is the encoding used to  
> encode the document."
>
> This will result in an invalid Content-Type header if the UA has  
> initialized the headers with a default (which I think the spec currently  
> allows; and at least one UA was reported to do). See comment above about  
> header handling.

Rephrased.


> "While downloading the resource the following rules are to be observed."
>
> That reads strange. HTTP requests do not "download" resources. Say  
> "while executing the request".

Thanks, fixed.


> "If the user agent supports HTTP State Management it should persist,  
> discard and send cookies (as received in the Set-Cookie and Set-Cookie2  
> response headers, and sent in the Cookie header) as applicable.  
> [RFC2965]"
>
> This should probably include a reference to the Set-Cookie (not  
> Set-Cookie2) spec as well (RFC2109).

I believe it used to do that and it was pointed out that that  
specification is not useful in practice and would actually do more harm  
than good. I'm not really sure what to do here.


> "If the user agent implements server-driven content-negotiation it  
> should set Accept-Encoding and Accept-Charset headers as appropriate; it  
> must not automatically set the Accept."
>
> s/set the Accept/set the Accept header/

Fixed due to other changes.


> "Responses to such requests must have the content-encodings  
> automatically decoded. [RFC2616]"
>
> "Such requests" is a bit fuzzy here. Please rephrase in terms of what  
> the response contains (such as Content-Encoding header being present  
> etc).

I simply dropped "to such requests". I hope that's ok.


> "// The following script:
> var client = new XMLHttpRequest();
> client.open("GET", "test.txt", true);
> client.send();
> client.onreadystatechange = function() {
>   if(this.readyState == 3) {
>    print(this.getAllResponseHeaders());
>   }
> }
>
> // ...should output something similar to the following text:
> Date: Sun, 24 Oct 2004 04:58:38 GMT
> Server: Apache/1.3.31 (Unix)
> Keep-Alive: timeout=15, max=99
> Connection: Keep-Alive
> Transfer-Encoding: chunked
> Content-Type: text/plain; charset=utf-8"
>
> I think examples like this would be more readable (and take less space)  
> when using the syncr. mode.

I would like to avoid encouraging authors to use the synchronous API.


> "status of type unsigned short, readonly
>
> On getting, if available, it must return the HTTP status code sent by  
> the server (typically 200 for a successful request). Otherwise, if not  
> available, the user agent must raise an INVALID_STATE_ERR exception."
>
> This may be incorrect when the UA caches (304 vs 200).

That's why it says typically.


> "statusText of type DOMString, readonly
>
>      On getting, if available, it must return the HTTP status text sent  
> by the server (appears after the status code). Otherwise, if not  
> available, the user agent must raise an INVALID_STATE_ERR exception."
>
> Really? It seems to me that if somebody really implements this, clients  
> are likely to break. Why not allow an empty string here?

This is what clients have implemented as far as I can tell. Though the  
HTTP status text could be the empty string, if that's what you mean...


> Finally, my main other issue with this spec that it is silent about the  
> recommended behaviour for unsafe methods, about which RFC2616 says in  
> Section 9.1.1  
> (<http://greenbytes.de/tech/webdav/rfc2616.html#rfc.section.9.1.1>):
>
> "Implementors should be aware that the software represents the user in  
> their interactions over the Internet, and should be careful to allow the  
> user to be aware of any actions they might take which may have an  
> unexpected significance to themselves or others.
>
> In particular, the convention has been established that the GET and HEAD  
> methods SHOULD NOT have the significance of taking an action other than  
> retrieval. These methods ought to be considered "safe". This allows user  
> agents to represent other methods, such as POST, PUT and DELETE, in a  
> special way, so that the user is made aware of the fact that a possibly  
> unsafe action is being requested."
>
> Thus, allowing a web page to submit a PUT, POST or DELETE request  
> without user interaction seems to be a very dangerous thing to me, and  
> the spec should point that out (see also  
> <http://ietf.osafoundation.org:8080/bugzilla/show_bug.cgi?id=237>).

All requirements from HTTP are taken over unless explicitly stated so I  
don't think this is needed.

Kind regards,


-- 
Anne van Kesteren
<http://annevankesteren.nl/>
<http://www.opera.com/>

Received on Monday, 12 May 2008 15:53:42 UTC