Copyright ©2006 W3C® (MIT, ERCIM, Keio), All Rights Reserved. W3C liability, trademark and document use rules apply.
This document provides two mechanisms for a web document to relax typical cross-site scripting restrictions on accessing it. Using either a HTTP header or XML processing instruction (or both) documents can indicate they can be accessed from domain A, but not from domain B, et cetera.
This document is based on the W3C's 13 June 2005 Working Group Note Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0 [AC-NOTE].
This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.
This is the @@ October 2006 Working Draft of the Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction, the first publication of this specification. This document is produced by a Task Force of the Voice Browser, Web API and Web Application Formats (WAF)Working Groups under the auspices of the WAF Working Group. The Web API and Web Application Formats Working Groups are part of the Rich Web Clients Activity and the Voice Browser Working Group is part of the Voice Browser Activity. Both of these Activities are within the W3C's Interaction Domain.
The W3C has not analyzed the security problems which motivated the publication of this document. This document only addresses a subset of the security issues involved in exposing XML data over HTTP. This document documents an existing practice used under certain circumstances but in no way implies that the technique would be appropriate or secure to protect document access under all circumstances. Implementors should perform their own security analysis.
The public is encouraged to send comments to the WAF Working Group's public mailing list public-appformats@w3.org (archive). See W3C mailing list and archive usage guidelines.
This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.
Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.
1 Introduction
2 <?access-control?> Processing Instruction Algorithm
3 Security Considerations for User Agent Implementors and Application Authors
Web browsers disallow a script or page on domain A to access content on domain B, because of security considerations. Authors resort to proxying the content through the domain hosting their application (A) thereby increasing overhead and limiting scalability. Access Control for Web Documents enables a way for authors to declare that a document on domain B may in fact be accessed by domain A by means of a HTTP header or XML processing instruction (or both).
The HTTP header and XML processing instruction are designed explicitly to enable extending the "sandbox" and are not meant as a restriction mechanism. The expectation is that the user agent's default policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.
The user agent is responsible for validating that the requesting document (A) is allowed to access the contents of the requested document (B). This validation is performed by comparing the URL of document A with the access-control rules provided by document B.
Access-control rules are specified in the Content-access-control HTTP header returned with the requested document (B). In addition, the access-control rules may be returned in an <?access-control?> processing instruction included in the XML prolog of the requested document (B).
All rules provided must be used. If any rules are not well-formed for any reason, the user agent must fall-back to it default security policy. User agents must not use partial or incomplete information for comparison.
There are two types of rules: allow and deny.
Each rule has an associated URI pattern or patterns which may contain the '*' character as a wildcard. Wildcards may be placed anywhere in a URI string. Substring matches are not performed. Wildcards have the following rules:
*://example.com
is allowed; http*://
is not allowedhttp://*.example.com
does match http://www.example.com/
http://*.example.com
does NOT match http://test.www.example.com
http://www.example.com/*/index.html
does match http://www.example.com/test/index.html
http://www.example.com/*/index.html
does NOT match http://www.example.com/dev/test/index.html
http://www.example.com/test/*
matches http://www.example/com/test/a/b/c/index.html
*://*.example.com/test/*
matches https://test.example.com/test/a/b/c/index.html
Rules are considered least specific to most specific in the following order:
Comparing a pattern to the requesting URI is performed by a bytewise comparison of the URI to the target.
When multiple rules are present, they must be evaluated in the following order:
Evaluation is performed by evaluating the requesting URL against each rule. The last rule whose target matches the requesting URL is used. In the event that no rule matches the requesting URL, the user agent must use its default policy to determine whether to allow the requesting URL access.
Any document retrieved via HTTP MAY have access control rules defined in the HTTP header.
Access-Control = "Access-Control" ":" 1#access-control-rule |
access-control-rule = instruction SP "<" uripattern ">" |
instruction = "allow" / "deny" / token |
uripattern ; URI from RFC3986, replacing ; reg-name with wildcard-reg-name |
wildcard-reg-name = *( unreserved | pct-encoded | sub-delims | "{*}" ) |
Both the header field name and value are case-insensitive.
If the keyword "allow" is the instruction then the URI patterns for that header are added to the allow ruleset. If the keyword "deny" is the instruction then the URI patterns for that header are added to the deny ruleset.
NOTE: The header name may change in future drafts.
NOTE: Should extension instructions be allowed? Should they be ignored? eg. Ignoring allow-on-tuesday doesn't weaken the security policy but ignoring deny-on-tuesday will.
[1] | AccessControlPI |
::= | '<?access-control' (S
'allow="'AccessList'"' | S
"allow='"AccessList"'")? (S
'deny="'AccessList'"' | S
"deny='"AccessList"'")? (S
'require-secure="'true'"' |
"require-secure="'false'")? S?
'?>' |
[2] | AccessList |
::= | AccessItem (S AccessItem)* | '*' |
[3] | AccessItem |
::= | HostName | PartialHostName | IPv4address | genericuri |
[4] | PartialHostName |
::= | '*.' HostName |
As required by RFC2616, multiple Access-Control headers are combined in the order in which they are received. For example, the following two HTTP responses and XML Processing Instruction generate the same ruleset.
------------------------------------------------------------- HTTP/1.1 200 OK Date: Wed, 23 Aug 2006 09:31:41 GMT Server: Apache/1.3.37 (Unix) Content-Length: 32924 Content-Type: text/html; charset=utf-8 Access-Control: allow <http://good.example.com>, allow <http://nice.example.com> Access-Control: allow <http://friendly.example.com>, deny <http://*.example.com> HTTP/1.1 200 OK Date: Wed, 23 Aug 2006 09:31:41 GMT Server: Apache/1.3.37 (Unix) Content-Length: 32924 Content-Type: text/html; charset=utf-8 Access-Control: allow <http://good.example.com>, allow <http://nice.example.com>, allow <http://friendly.example.com>, deny <http://*.example.com> <?access-control allow="http://good.example.com http://friendly.example.com http://nice.example.com" deny="http://*.example.com"?> -------------------------------------------------------------
An Access-Control header or processing instruction is in error if the value has incorrect syntax, that is if either the instruction or any uripattern is malformed. If any Access-Control header or processing instruction is in error then the User Agent should ignore all Access-Control headers and use its default security policy.
The processing instruction is designed explicitly to enable extending the sandbox for access to XML content for "read". It is not designed to used to enforce sandboxing itself restriction or provided generalized trust validation. The expectation is that the user agent's default sandboxing policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.
A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. In contrast, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary XML feeds on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators must be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.
User agents which implement this capability should take care not to expose other trusted data (cookies, HTTP header data) inappropriately. The access-control processing instruction is only designed to enable access to the XML content.
User agents which implement this capability should also take care to properly normalize Unicode and to properly interpret IDNs to prevent URL spoofing attacks.
Application authors should be aware that XML content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by failing to validate the content returned or executing the retrieved content directly.