(formal) comment on security considerations from Jeremy Carroll on 2007-02-13 (public-grddl-comments@w3.org from January to March 2007)

From: Jeremy Carroll <jjc@hpl.hp.com>
Date: Tue, 13 Feb 2007 17:21:01 +0000
To: public-grddl-comments@w3.org
CC: jena-devel <jena-devel@lists.sourceforge.net>
Message-ID: <45D1F37D.9060209@hpl.hp.com>
This is a personal comment concerning the security considerations section.

a) editorial: the last paragraph of

http://www.w3.org/2004/01/rdxh/spec#sec

should be elsewhere, and not under security.

b) substantive:

Neither
http://www.w3.org/2004/01/rdxh/spec#sec
nor
http://www.w3.org/TR/2006/WD-grddl-20061024/#sec

is adequately detailed.

Without detailed instructions a typical semantic web developer is 
unlikely to produce adequately secure code at reasonable cost.
The phrase "should take the appropriate measures" puts an unreasonable 
burden on implementers, who may need to hire security experts to advise 
them as to what are "the appropriate measures".

Without appropriate advice, most GRDDL implementations are likely
to have significant security flaws, which may will lead to the
technology being branded as insecure and untrustworthy, reducing
the value of the work of the working group, and other, more secure, 
implementations.

A better approach would be, in the words quoted in your spec, to:
[[
[use] the discussion of the "application/postscript" type [...] as a 
model for considering other [...] remote execution capabilities.
]]

as an appendix to this comment, I attempt precisely that, and offer that 
as a first draft of text that would address this comment.

I note that the text from RFC 2046 appears to have normative force, but 
"should" and "may" have their usual English meanings, rather than the 
precise definitions of RFC 2119. My preference is that advice to 
implementers concerning security should be normative.

Making this change before last call would allow last call reviewers to 
offer their own expert opinion as to whether the suggested text (as 
modified by the WG) is adequate or not.

=====
End of comment.

As an aside, in discussion with colleagues, the point was made that an 
underlying problem is the unsatisfactory nature of the security 
considerations for XSLT (both versions). The GRDDL WG may choose to 
alert the XSLT and XQuery WGs to this comment.

=====

Appendix: Draft Text

Provenance:
taken from
http://www.faqs.org/rfcs/rfc2046.html
section 4.5.2, with much alteration
copyright issues may need to be explored, I see no copyright notice in 
that copy of RFC 2046, I assume the general IETF policy allows this form 
of reuse.



    The execution of general-purpose programming languages
    as interpreters for transformations entails
    serious security risks, and implementors are discouraged from simply
    sending GRDDL transforms to "off-the-shelf" interpreters.  While it
    is usually safe to pass documents from trusted sources through a
    GRDDL processor, where the potential
    for harm is constrained by lack of mailicious intent,
    implementors should consider all of the following before they add
    the ability to execute arbitrary GRDDL transforms linked from
    arbitrary Web documents.

    The remainder of this section outlines some, though probably not all,
    of the possible problems with the execution of GRDDL transforms,
    with particular reference to transforms in XSLT.

[[ New point (1), not in RFC 2046 ]]
     (1)   GRDDL, like many Web technologies, fundamentally
           relies on the dereferencing of URLs. With unconstrained
           use of GRDDL, untrusted transform may access URLs which
           the end-user has read or write permission, while
           the author of the transform does not. This is particularly
           pertinent for URLs from the file: scheme; but many other
           schemes are also impacted.
           The untrusted
           code may, having read documents which the author
           did not have permission to access, transmit the
           content of the documents, to arbitrary Web
           servers by encoding the contents within a URL,
           that may be passed to the server. [[See tests,
           currently security4, security6 in the Jena GRDDL Reader
           test area]]

     (2)   Dangerous operations in the XSLT language
           include, but may not be limited to, the operations
           involving getting a URL:
           "document()", "doc()", "unparsed-text() and
           "unparsed-text-available()", and "xsl:result-document"
           which involves writing to a URL.
           "xsl:include" and "xsl:import" present fewer risks
           if they are processed before execution
           of the transform, rather than during it.
           However, some GRDDL processing paths, particularly
           those involving profile transformations and
           schema transformations, process an xsl:include
           or xsl:import for one transform after having
           completed the execution of some other transform.
[[Note: last sentence is true, but I don't think it exposes any risks 
that are not exposed by the truth that GRDDL will get a URL generated 
during a profile transformation, as in security6. Perhaps the last
sentence is unnecessary and distracting]]


           Writers of GRDDL transforms should avoid the use of
           potentially dangerous URL operations, since these
           operations are quite likely to be unavailable in secure
           GRDDL implementations.  Software executing
           GRDDL transforms should either completely disable
           all potentially dangerous URL operations or take
           special care not to delegate any special authority to
           their operation.  In particular, operations to read
           or write URLs should not be executed with the
           privileges of the current user, but with the privileges
           associated with an untrusted party.
           Such disabling and/or checking
           should be done completely outside of the reach of the
           transform language itself; care should be taken to
           insure that no method exists for re-enabling full-
           function versions of these operators.

[[ Note (2) and (3) of 4.5.2 RFC 2046 not applicable]]


     (3)   Some implementations of the transform language may
           provide nonstandard
           facilities for the direct loading and execution of
           other programming language code. For example, an XSLT
           implementation may provide a method of calling Java code.
           Such facilities are quite obviously open
           to substantial abuse.  GRDDL transforms should
           not make use of such features.  Besides being totally
           implementation-specific, they are also likely to be
           unavailable in secure implementations of the transformation
           langauge.
           Software executing GRDDL transforms should not
           allow such operators to be used if they exist.

     (4)   XSLT is an extensible language, and many, if not
           most, implementations of it provide a number of their
           own extensions.  This document does not deal with such
           extensions explicitly since they constitute an unknown
           factor.  GRDDL transforms should not make use
           of nonstandard extensions; they are likely to be
           missing from some implementations.  Software executing GRDDL
           transforms should make sure that any
           nonstandard operations are secure and don't
           present any kind of threat.

     (5)   It is possible to write transforms that consumes huge
           amounts of various system resources.  It is also
           possible to write transforms that loop
           indefinitely.  Both types of transforms have the
           potential to cause damage if sent to unsuspecting
           recipients.  GRDDL documents should avoid the
           construction and dissemination of such transforms, which
           is antisocial.  Software executing GRDDL
           transforms should provide appropriate mechanisms to abort
           processing after a reasonable amount of time has
           elapsed. In addition, GRDDL software should be
           limited to the consumption of only a reasonable amount
           of any given system resource.

[[ point (7) of 4.5.2 RFC 2046 n/a ]]

     (6)   Finally, bugs may exist in some interpreters of the
           transform language
           which could possibly be exploited to gain unauthorized
           access to a recipient's system.  Apart from noting this
           possibility, there is no specific action to take to
           prevent this, apart from the timely correction of such
           bugs if any are found.
Received on Tuesday, 13 February 2007 17:21:43 UTC