GRDDL Security Considerations ( was Re: LC: Opposing OWL/XML format) from Bijan Parsia on 2009-01-28 (public-owl-wg@w3.org from January 2009)

From: Bijan Parsia <bparsia@cs.man.ac.uk>
Date: Wed, 28 Jan 2009 11:37:17 +0000
To: Bijan Parsia <bparsia@cs.manchester.ac.uk>
Cc: Alan Ruttenberg <alanruttenberg@gmail.com>, Jonathan Rees <jar@creativecommons.org>, W3C OWL Working Group <public-owl-wg@w3.org>
Message-Id: <FA273CF6-82A3-4A43-8B47-A6D6497112A0@cs.man.ac.uk>

I was thinking about the problems of sending code to a server and how  
people might not want to send their data outside, and I realized that  
this might be considered a feature. If it's clear that random GRDDL  
is not trustworthy, there will be less blind trusting.

I just noticed that most of the exploits I was thinking of I are  
warned about in:
	http://www.w3.org/TR/grddl/#sec

Note that I don't think there are any reasonably secure GRDDL agents  
out there. Consider the Jena description:
	http://jena.sourceforge.net/grddl/security-conformance.html

I see no security description of glean.py:
	http://www.w3.org/2003/g/glean.py
but the key lines:
	def doXSLT(xform, inf, outf, params = {}):
	    args = ["xsltproc", "--novalid", "-o", outf]
     		for k in params.keys():
         		args.extend(("--stringparam", k, params[k]))

	    spawn(XSLTPROC, args + [xform, inf])

Do no feature disabling except validation and explictly permit  
writing to the filesystem "-o".

GRDDL.py has no secuity description (in the source code):
	http://www.w3.org/2001/sw/grddl-wg/td/GRDDL.py

It's not obvious to me that the processor is secured in any way:
	result = processor.runNode(self.dom, self.url, ignorePis=1)
I see a "zone" argument, so there might be something configurable,  
but not *inside* the XSLT processer, AFAICT. It seems to write to  
file in normal or fairly obvious mode.

Raptor:
	http://librdf.org/raptor/api/parser-grddl.html
allows setting a timeout for URI retrieval but not, afaict, for the  
XSLT processing. I don't know if there is anything inside. I didn't  
see any security discussion.

BTW, I do not mean this *AT ALL* as criticism of these libraries or  
their authors. AFAICT, the software is perfectly fine and does not  
claim to be more than it is. The Jena security discussion bends over  
backwards to be cautious and appropriately warning.

However, these facts make it unclear how heavily we should weight  
concerns about exposing sensitive data here. If we *are* concerned,  
the simplest thing to do is not to provide a auto-downloadable  
transform at all. If the W3C would like to host a version of the OWL  
API based converter, that is, I think fine. If software wants to use  
that at it's own, explicit, risk, that's up to them. (A la HTML  
editors using the W3C HTML validator.)

There's another class of exploits, of course, based on spoofing the  
w3c site. (I'm unclear what happens if I include a explict transform  
attribute whether that overrides the namespace..
""grddl:transformation="glean_title.xsl
			http://www.w3.org/2001/sw/grddl-wg/td/getAuthor.xsl"""".)

Are W3C hosted GRDDL transforms cryptographically signed?

(All this is additional to intended or accidental DDOS attacks  
against the W3C either for downloading or for processing.)

As far as I know, there's no requirement on GRDDL agents to notify a  
user when they download or use unaudited code.

Cheers,
Bijan.

Received on Wednesday, 28 January 2009 11:33:52 UTC