RDFa Security and Persistence (was: Re: RDFa Use Cases) from Manu Sporny on 2009-02-19 (public-rdf-in-xhtml-tf@w3.org from February 2009)

From: Manu Sporny <msporny@digitalbazaar.com>
Date: Wed, 18 Feb 2009 21:43:06 -0500
To: Ian Hickson <ian@hixie.ch>
CC: RDFa mailing list <public-rdf-in-xhtml-tf@w3.org>
Message-ID: <499CC73A.2030903@digitalbazaar.com>
Ian Hickson wrote:
>>> My understanding was that people wanted RDF data to be persisted 
>>> across multiple sessions, which would lead to bad data "poisoning the 
>>> well" in a way that no other feature in Web browsers has yet had to 
>>> deal with.
>> Some people do, some don't. I think we should assume that the RDF triple 
>> store may be more akin to the browser cache (can be cleared on a whim) 
>> than to a traditional database (clearing the data is bad).
> 
> If we allow any persistence without some solution to the trust/spam 
> problem, the store will quickly become useless (in the same way that the 
> various features to open a new window quickly became useless once sites 
> found ways to use them for doing popup ads).

I agree with you that we will need to find solutions to the trust/spam
problem, not only for RDFa, but for the general Web as well. There are
some ideas bouncing around, but we need to put them on the wiki. I have
started a page for this purpose:

http://rdfa.info/wiki/security-and-trust

The two examples that are up there have obvious flaws, but are meant to
be a starting point for the security/persistence conversation. Everyone
should feel free to rip those examples apart and improve upon them.

> This is one example of what I meant by having to evaluate each use case, 
> by the way. If we decide that "RDFa" means "a per-tab triple store with a 
> lifetime equal to the page and that is not affected by cross-origin 
> iframes", then that wouldn't address the "collect lots of data and then 
> query it" use case, despite still being "RDFa". 

I think it's going to be very difficult to find agreement on what RDFa
is and isn't at the application layer.

> It is IMHO important for 
> the RDFa community to agree on exactly which uses cases are the ones that 
> are intended to be addressed, so that we can make sure that what we come 
> up with actually does address exactly those cases. 

The list of use cases being non-exhaustive and constantly evolving, of
course.

> Is there documentation 
> anywhere on what the existing RDFa specification is attempting to solve 
> along these lines?

The current version of RDFa is based on the scenarios defined in this
document:

http://www.w3.org/TR/xhtml-rdfa-scenarios

> e.g. what is the storage semantic for the current RDFa 
> specification? 

Right now, the how and when to do persistence and trust is left to the
language and application that utilizes RDFa.

http://rdfa.info/wiki/developer-faq#Does_RDFa_define_a_storage_model_or_persistence_layer.2FAPI.3F

> Does it have persistence? 

The current RDFa spec doesn't mention anything about a persistence layer:

http://rdfa.info/wiki/developer-faq#Does_RDFa_define_a_storage_model_or_persistence_layer.2FAPI.3F

> How does it deal with cross-origin data load?

The current RDFa spec does not address cross-origin data load:

http://rdfa.info/wiki/Developer-faq#How_does_RDFa_deal_with_cross-origin_data_load.3F

> If we're just partitioning data stores on a per-origin basis, then there's 
> no need for signatures, even, we can just use the existing origin data. 
> The question is whether that is enough.

No, we would still want signatures even if we were partitioning data
stores. For example, if you wanted to verify a digital contract or
digital statement of work of any kind, per-origin verification isn't
good enough.

> (This still doesn't address the problem of sites like wikipedia or blogs 
> that accept input from multiple users, though.)

This is a stab at addressing that issue:

http://rdfa.info/wiki/security-and-trust#Signature_attributes

> There needs to be some mechanism for determining what's in the white 
> lists. 

Could you elaborate, please? Do you mean the format of the white lists?
Or the type of data the white lists store? Something else?

> (Black lists wouldn't work since an attacker could just come up 
> with an infinite number of alternative site names.)

Noted, correction has been made to (removed mention of blacklists):

http://rdfa.info/wiki/developer-faq#How_does_one_prevent_bad_triples_from_corrupting_a_local_triple_store.3F

> I don't really understand how the digital signature mechanism would work.

Does this help at all?

http://rdfa.info/wiki/security-and-trust#Signature_attributes

> In SSL, the user selects a single site, and the browser can verify that 
> that site is who the user thinks it is. 

In general, yes - but there are known attacks against this security
model (MITM, plug-in-based trusted certificate poisoning,
DNS/certificate hijacking), so it's not perfect.

> It doesn't prevent hostile sites 
> that the user intended to go to from interacting with the user. How would 
> digital signatures help here? Attackers can sign stuff just like anyone 
> else can, no?

http://rdfa.info/wiki/Developer-faq#Hackers_can_digitally_sign_triples_too.2C_what.27s_to_stop_hostile_sites_from_interacting_with_the_person_browsing.3F

-- manu

-- 
Manu Sporny
President/CEO - Digital Bazaar, Inc.
blog: Scaling Past 100,000 Concurrent Web Service Requests
http://blog.digitalbazaar.com/2008/09/30/scaling-webservices-part-1
Received on Thursday, 19 February 2009 02:43:43 UTC