Access Control for Web Documents

Status of this Document

This section describes the status of this document at the time of its publication. Other documents may supersede this document. A list of current W3C publications and the latest revision of this technical report can be found in the W3C technical reports index at http://www.w3.org/TR/.

This is the @@ September 2006 Working Draft of the Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction, the first publication of this specification. This document is produced by a Task Force of the Voice Browser, Web API and Web Application Formats (WAF)Working Groups under the auspices of the WAF Working Group. The Web API and Web Application Formats Working Groups are part of the Rich Web Clients Activity and the Voice Browser Working Group is part of the Voice Browser Activity. Both of these Activities are within the W3C's Interaction Domain.

The W3C has not analyzed the security problems which motivated the publication of this document. This document only addresses a subset of the security issues involved in exposing XML data over HTTP. This document documents an existing practice used under certain circumstances but in no way implies that the technique would be appropriate or secure to protect document access under all circumstances. Implementors should perform their own security analysis.

The public is encouraged to send comments to the WAF Working Group's public mailing list public-appformats@w3.org (archive). See W3C mailing list and archive usage guidelines.

This document was produced by a group operating under the 5 February 2004 W3C Patent Policy. The W3C maintains a public list of any patent disclosures made in connection with the deliverables of the group; that page also includes instructions for disclosing a patent. An individual who has actual knowledge of a patent which the individual believes contains Essential Claim(s) must disclose the information in accordance with section 6 of the W3C Patent Policy.

Publication as a Working Draft does not imply endorsement by the W3C Membership. This is a draft document and may be updated, replaced or obsoleted by other documents at any time. It is inappropriate to cite this document as other than work in progress.

1 Introduction

Web browsers disallow a script or page on domain A to access content on domain B, because of security considerations. Authors resort to proxying the content through the domain hosting their application (A) thereby increasing overhead and limiting scalability. Access Control for Web Documents enables a way for authors to declare that a document on domain B may in fact be accessed by domain A by means of a HTTP header or XML processing instruction (or both).

The HTTP header and XML processing instruction are designed explicitly to enable extending the "sandbox" and are not meant as a restriction mechanism. The expectation is that the user agent's default policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.

2 <?access-control?> Processing Instruction Algorithm

The user agent is responsible for validating that the requesting document (A) is allowed to access the contents of the requested document (B). This validation is performed by comparing the URL of document A with the access-control rules provided by document B.

Access-control rules are specified in the Content-access-control HTTP header returned with the requested document (B). In addition, the access-control rules may be returned in an <?access-control?> processing instruction included in the XML prolog of the requested document (B).

All rules provided must be used. If any rules are not well-formed for any reason, the user agent must fall-back to it default security policy. User agents must not use partial or incomplete information for comparison.

There are two types of rules: allow and deny.

Each rule has an associated URI pattern or patterns which may contain the '*' character as a wildcard. Wildcards may be placed anywhere in a URI string. Substring matches are not performed. Wildcards have the following rules:

A single wildcard ('*') may be used to grant access to any web resource.
A wildcard may be used in places of the enter protocol handler.
*://example.com is allowed; http*:// is not allowed
A wildcard may replace one level of hostname definition.
http://*.example.com does match http://www.example.com/
http://*.example.com does NOT match http://test.www.example.com
A wildcard may replace a single directory level.
http://www.example.com/*/index.html does match http://www.example.com/test/index.html
http://www.example.com/*/index.html does NOT match http://www.example.com/dev/test/index.html
A wildcard at the end of the URI may represent multiple levels of directories and a document name.
http://www.example.com/test/* matches http://www.example/com/test/a/b/c/index.html
Multiple wildcards may be combined in the same pattern.
*://*.example.com/test/* matches https://test.example.com/test/a/b/c/index.html

Rules are considered least specific to most specific in the following order:

Rules with a single wildcard.
Rules with a wildcard in the host or domain name.
Rules with a wildcard in the protocol designator.
Rules with a wildcard in the hostname.
Rules with a wildcard in the directory name.
Rules with a tailing wildcard.
Rules with no wildcards.

Comparing a pattern to the requesting URI is performed by a bytewise comparison of the URI to the target.

When multiple rules are present, they must be evaluated in the following order:

Least specific rules come before more specific rules.
At the same level of specificity, allow rules come before deny rules.

Evaluation is performed by evaluating the requesting URL against each rule. The last rule whose target matches the requesting URL is used. In the event that no rule matches the requesting URL, the user agent must use its default policy to determine whether to allow the requesting URL access.

Access-Control HTTP Header

Any document retrieved via HTTP MAY have access control rules defined in the HTTP header.

Access-Control      = "Access-Control" ":"
                      1#access-control-rule

access-control-rule = instruction SP "<" uripattern ">"

instruction         = "allow" / "deny" / token

uripattern          ; URI from RFC3986, replacing
                    ; reg-name with wildcard-reg-name

wildcard-reg-name   = *( unreserved | pct-encoded |
                         sub-delims | "{*}" )

Both the header field name and value are case-insensitive.

If the keyword "allow" is the instruction then the URI patterns for that header are added to the allow ruleset. If the keyword "deny" is the instruction then the URI patterns for that header are added to the deny ruleset.

NOTE: The header name may change in future drafts.

NOTE: Should extension instructions be allowed? Should they be ignored? eg. Ignoring allow-on-tuesday doesn't weaken the security policy but ignoring deny-on-tuesday will.

Access Control Processing Instruction

[1]	`AccessControlPI`	::=	`'<?access-control' (S 'allow="'AccessList'"' \| S "allow='"AccessList"'")? (S 'deny="'AccessList'"' \| S "deny='"AccessList"'")? (S 'require-secure="'true'"' \| "require-secure="'false'")? S? '?>'`
[2]	`AccessList`	::=	`AccessItem (S AccessItem)* \| '*'`
[3]	`AccessItem`	::=	`HostName \| PartialHostName \| IPv4address \| genericuri`
[4]	`PartialHostName`	::=	`'*.' HostName`

As required by RFC2616, multiple Access-Control headers are combined in the order in which they are received. For example, the following two HTTP responses and XML Processing Instruction generate the same ruleset.

-------------------------------------------------------------
HTTP/1.1 200 OK
Date: Wed, 23 Aug 2006 09:31:41 GMT
Server: Apache/1.3.37 (Unix)
Content-Length: 32924
Content-Type: text/html; charset=utf-8
Access-Control: allow http://good.example.com, allow http://nice.example.com
Access-Control: allow http://friendly.example.com, deny http://*.example.com

HTTP/1.1 200 OK
Date: Wed, 23 Aug 2006 09:31:41 GMT
Server: Apache/1.3.37 (Unix)
Content-Length: 32924
Content-Type: text/html; charset=utf-8
Access-Control: allow http://good.example.com, allow http://nice.example.com, allow http://friendly.example.com, deny http://*.example.com

<?access-control allow="http://good.example.com
                        http://friendly.example.com
                        http://nice.example.com"
                 deny="http://*.example.com"?>
-------------------------------------------------------------

An Access-Control header or processing instruction is in error if the value has incorrect syntax, that is if either the instruction or any uripattern is malformed. If any Access-Control header or processing instruction is in error then the User Agent should ignore all Access-Control headers and use its default security policy.

3 Security Considerations for User Agent Implementors and Application Authors

The processing instruction is designed explicitly to enable extending the sandbox for access to XML content for "read". It is not designed to used to enforce sandboxing itself restriction or provided generalized trust validation. The expectation is that the user agent's default sandboxing policy is more strict. Therefore, it is always safe to fall-back to default policy in the event of an error.

A user agent running inside a trusted corporate network and executing untrusted content should enforce a sandboxing policy by denying access. In contrast, it may be appropriate to relax this policy when the user agent is executing only trusted applications that requires access to arbitrary XML feeds on the local network. User agent vendors that allow this sandboxing policy to be configured are encouraged to provide guidance on the appropriate settings. It is critical that network administrators understand the security issues pertinent to their environment and configure their systems appropriately. In tandem, developers and web server administrators must be aware of the dangers of trusting a user agent that can be configured to disable sandboxing.

User agents which implement this capability should take care not to expose other trusted data (cookies, HTTP header data) inappropriately. The access-control processing instruction is only designed to enable access to the XML content.

User agents which implement this capability should also take care to properly normalize Unicode and to properly interpret IDNs to prevent URL spoofing attacks.

Application authors should be aware that XML content retrieved from another site is not itself trustable. Authors should take care to protect against exposing themselves to cross-site scripting attacks by failing to validate the content returned or executing the retrieved content directly.

A References

AC-NOTE: Authorizing Read Access to XML Content Using the <?access-control?> Processing Instruction 1.0, ed. Matt Oshry, Brad Porter, RJ Auburn. W3C Working Group Note, 13 June 2005. See http://www.w3.org/TR/access-control/.
DOM3LS: Document Object Model (DOM) Level 3 Load and Save Specification, ed. Johnny Stenback and Andy Heninger. W3C Recommendation, April 2004. See http://www.w3.org/TR/DOM-Level-3-LS/.
RFC2616: Hypertext Transfer Protocol -- HTTP/1.1, ed. R. Fielding et al. IETF RFC 2616, June 1999. See http://www.ietf.org/rfc/rfc2616.txt.
RFC3986: Uniform Resource Identifier (URI): Generic Syntax , ed. T. Berners-Lee et al. IETF RFC 3986, January 2005. See http://www.ietf.org/rfc/rfc3986.txt.
VXML21: VoiceXML 2.1, ed. Matt Oshry et al. W3C Candidate Recommendation, June 2005. See http://www.w3.org/TR/2005/CR-voicexml21-20050613/.
XML: Extensible Markup Language (XML) 1.0, ed. Tim Bray et al. W3C Recommendation, February 2004. See http://www.w3.org/TR/2004/REC-xml-20040204/.

Access Control for Web Documents

W3C Working Group Working Draft 19 September 2006

Abstract